Soundex is a system, invented by the U.S. National Archive
, of grouping differently-spelled variants of a word together. It was created to index U.S. Census
results, but has many uses: at least in Florida
, Soundex is used to generate the first part of a person's driver's license
number (my last name is Saunders, and my license number starts out "S536"), and it is used in Everything's non-exact searches. But one of its major uses is in genealogy
The point of Soundex in genealogy is to make it easier for the researcher to connect different spellings of names that may be related. My last name is Saunders, but four generations ago it was spelled Sanders. Soundex ignores vowels, and so would group those names together. It's not perfect; my mother's family the Lonons would not be grouped in an index with their ancestors who spelled it London (but would be listed together with some relative whose census taker spelled it Lunun).
There is a converter form at http://www.ourancestry.com/soundex.html, but if you'd rather know the rules to make Soundex codes yourself, here they are as given at http://www.nara.gov/genealogy/coding.html:
Every soundex code consists of a letter and three numbers, such as W-252. The letter is always the first letter of the surname. The numbers are assigned to the remaining letters of the surname according to the soundex guide shown below. Zeroes are added at the end if necessary to produce a four-character code. Additional letters are disregarded. Examples:
Washington is coded W-252 (W, 2 for the S, 5 for the N, 2 for the G, remaining letters disregarded).
Lee is coded L-000 (L, 000 added).
Soundex Coding Guide: Number Represents the Letters
- 1: B, F, P, V
- 2: C, G, J, K, Q, S, X, Z
- 3: D, T
- 4: L
- 5: M, N
- 6: R
Disregard the letters A, E, I, O, U, H, W, and Y.
Additional Soundex Coding Rules
Names With Double Letters
If the surname has any double letters, they should be treated as one letter. For example:
Gutierrez is coded G-362 (G, 3 for the T, 6 for the first R, second R ignored, 2 for the Z).
Names with Letters Side-by-Side that have the Same Soundex Code Number
If the surname has different letters side-by-side that have the same number in the soundex coding guide, they should be treated as one letter. Examples:
Pfister is coded as P-236 (P, F ignored, 2 for the S, 3 for the T, 6 for the R).
Jackson is coded as J-250 (J, 2 for the C, K ignored, S ignored, 5 for the N, 0 added).
Tymczak is coded as T-522 (T, 5 for the M, 2 for the C, Z ignored, 2 for the K). Since the vowel "A" separates the Z and K, the K is coded.
Names with Prefixes
If a surname has a prefix, such as Van, Con, De, Di, La, or Le, code both with and without the prefix because the surname might be listed under either code. Note, however, that Mc and Mac are not considered prefixes.
For example, VanDeusen might be coded two ways:
V-532 (V, 5 for N, 3 for D, 2 for S)
D-250 (D, 2 for the S, 5 for the N, 0 added).
If a vowel (A, E, I, O, U) separates two consonants that have the same soundex code, the consonant to the right of the vowel is coded. Example:
Tymczak is coded as T-522 (T, 5 for the M, 2 for the C, Z ignored (see "Side-by-Side" rule above), 2 for the K). Since the vowel "A" separates the Z and K, the K is coded.
If "H" or "W" separate two consonants that have the same soundex code, the consonant to the right of the H or W is not coded. Example:
Ashcraft is coded A-261 (A, 2 for the S, C ignored, 6 for the R, 1 for the F). It is not coded A-226.