I’ve been reading “guides” Ancestry.com has to their search settings. And I’m a little bit confused.
From the Ancestry.com discussion of “name settings:”
Soundex Variations: Soundex is a common algorithm used to generate alternate spellings of a surname. If you choose this option, any record that contains one of the Soundex variations for a surname might appear in your results. (http://search.ancestry.com/Search/Help/SearchForm.aspx?topic=lname)
At the risk of being technical, Soundex goes not generate alternate spellings (sloppy census takers do that!). Soundex is an algorithm that converts last names to a four character “equivalent.” That four character equivalent begins with a letter (the first letter in the last name) and then contains three numeric values based upon the three following non-vowel “sounds” in the name.
The process generally is as follows:
- Keep the initial letter of the name. Omit subsequent references to a, e, i, o, u, y, h, w. This is done because these are “soft” sounds.
- After the first letter, consonants are replaced with numbers (sounds that are similar get the same number):
- b, f, p, v → 1
- c, g, j, k, q, s, x, z → 2
- d, t → 3
- l → 4
- m, n → 5
- r → 6
- If two or more letters that convert to the same number are adjacent to each other (before step 1), keep the first letter, ignoring the second. The premise here is that two adjacent letters with the same number make one sound. Two letters with the same number separated by ‘h’ or ‘w’ are coded as a single number. Two letters with the same number separated by a vowel are coded twice, presumably because two sounds ae made. This applies to the first letter as well.
- If there are not enough letters to get three numbers, use zeroes for the remaining ones (after all, there are no more sounds). When three numbers have been used, quit and ignore the remaining consonants.
My last name (Neill) has a Soundex code of N400. So do Newell, Neal, Noel, etc.
The numbers Soundex generates should be the same for names that sound the same. Should be. There are of course exceptions and Soundex usually works best with English names. Generally–keep in mind this post is not about whether Soundex works well for all last names (it doesn’t). There is a webpage on Rootsweb that will calculate the Soundex code for a given name. While this converter does give you names that have the same code, the algorithm is not generating those names. It’s most likely comparing them to a list.
The end result?
If the Soundex option is chosen by the searcher, the search results should contain entries with surnames that have the same Soundex code as the name entered.
What should it say? Something like this:
[altered suggestion] Soundex Variations: Soundex is a common algorithm used to determine if two names sound similar by converting each name to a numeric code. If you choose this option, any record with an entry that has a matching Soundex code will appear among your results.
Might?
In reviewing the guide, I noticed an additional item of interest (emphasis added):
(from Ancestry.com as in the first quote) If you choose this option, any record that contains one of the Soundex variations for a surname might appear in your results.
Might?
If the coding has been done correctly the “Soundex variations” should appear in the results. There should not be any “might” about it.
I’m hoping this is just a typo.
3 Responses
If a search target were born in 1869, regardless of soundex-similarity I should not get results for 1840 US Census heads of household or for US Revolutionary War service or pension records. That is, factors other than names should be filtering search results.
That’s a separate issue and no you should not. The “party line” on why you get these results is that it’s possible the birth date you have for that person is incorrect and that’s why those results are being returned. I don’t agree with that “party line,” but that’s the programming algorithm. I’ve complained about it before, but I’m afraid that my voice (and the voices of a few others) aren’t enough to justify change.
Personally I normally search specific databases–at least most of the time.
The concern I had was that the “help” didn’t really describe Soundex correct and, if the coding is done correctly, there is no “might” about whether the correct results are obtained or not.
You never know what you’ll get with an ancestry.com search. Sometimes it’s what you asked for but too often it is not. Yes, the dates that are off by a couple of centuries “might” not matter to them. It does matter to me.