Name Parse macro
The Name Parse macro splits a full name (which can contain title/prefix, first name, middle name or initial, last name, suffix, and professional title) into individual name components. Use the macro prior to matching so that individuals can be matched more effectively.
The Name Parse macro accepts a single input table, and parses one text field of that table into the components of up to two names. It uses a pattern-based recognition approach, with token tables to determine which input is most likely to be a component like first name, last name, and suffix. You can optionally specify an additional lookup table.
The Name Parse macro assumes that there can be one or two names (we'll call them Name1 and Name2) in the input field. It adds the following fields to the input table, and populates them with the following parsed name components:
Field | Name component |
---|---|
| The prefix (Dr, Mr, Mrs) of Name1 |
| The first name of Name1 |
| The middle name of Name1 |
| The last name of Name1 |
| The generational suffix (Jr, III) of Name1 |
| The profession-specific suffix (MD, DVM, PHD) of Name1 |
| The professional title (Auditor, VP of Engineering) of Name1 |
| The nickname of Name1 |
| The gender of Name1 |
| The prefix (Dr, Mr, Mrs) of Name2 |
| The first name of Name2 |
| The middle name of Name2 |
| The last name of Name2 |
| The generational suffix (Jr, III) of Name2 |
| The profession-specific suffix (MD, DVM, PHD)of Name2 |
| The professional title (Auditor, VP of Engineering) of Name2 |
| The nickname of Name2 |
| The gender of Name2 |
Name Parse macro configuration parameters
The Name Parse macro has a single set of configuration parameters in addition to the standard execution options:
Input data
Name type | Select input name type, either Full Name or Parsed Name. Default: Full name. |
Name field to parse | If Name type is Full Name, the name field. Default: Blank. |
Prefix | If Name type is Parsed name, prefix such as Mr or Mrs (Mr John Abel Smith Jr). Default: Blank. |
First name | If Name type is Parsed name, given name (John Abel Smith Jr). Default: Blank. |
Mid name | If Name type is Parsed name, middle name (John Abel Smith Jr). Default: Blank. |
Last name | If Name type is Parsed name, surname (John Abel Smith Jr). Default: Blank. |
Suffix | If Name type is Parsed name, generation name (John Abel Smith Jr). Default: Blank. |
Unicode input | Select if input data type is Unicode. |
Name order field | Optional field indicating whether names are formatted with first name first or last name first. This is a two-byte text field containing the values FL (first name followed by last name) or LF (last name followed by first name). If this is unspecified, names are assumed to be first name followed by last name. Default: None. |
Parsing/gender data source
You may specify an optional Parsing/gender data source. This is a table in DLD format containing either two or three columns: TOKEN, SYMBOL, and (optionally) GENDER.
Parsing behavior
Use large table | Select to use Data Management's comprehensive parsing lookup table. If you are resource-limited, you should leave this off. Default: No. |
Treat "/" as AND | If the name field might contain two names separated by a slash ("/"), select this option to ensure that the name is parsed correctly. Default: No. |
Prefer Last/First | For ambiguous two-name cases like "Scott Davis" and "Davis Scott", prefer Last/First interpretation over First/Last interpretation. Default: No. |
Preserve dual last name | If the name field might contain names with two last names, you can select this option to put both in a single last name field. If you have the name "Mary Andrews Smith", selecting this option will write "Andrews Smith" to the OUT_LNAME1 field. If this option isn't selected, "Andrews" will be written to the OUT_MIDNAME1 field and "Smith" will be written to the OUT_LNAME1 field. Default: No |
Split hyphenated last name | Select to split hyphenated last name into two fields, OUT_LNAME1 and OUT_LNAME1_2. Default: No. |
Suppress second name | Select this to output only the first name encountered in input. An input name of the form "Alice and Kirk McKinney" will be output as "Alice McKinney." Default: No. |
Parse suffix | Select this option to distinguish between generational suffixes such as Jr and III, suffixes such as DR and PhD and professional titles such as Finance Manager. An input name of the form "James Smith III, MD" will be output with "III" in the OUT_POSTNAME1 field and "MD" in the OUT_SUFFIX1 field. The name "Janice Jones, PhD, VP of Development" will be output with "PhD" in the OUT_SUFFIX1 field and "VP of Development" in the OUT_PROFTITLE1 field. Without this option checked, "PhD" and "VP of Development" would both go to the OUT_SUFFIX1 field. Default: Yes. |
No punctuation in titles | Select to remove the punctuation from honorary titles. This will strip the periods in titles like "M.D." Default: No. |
Initials at name end as suffix | Select this option to treat initials are found at the end of a name as a suffix. Default: No. |
Genderize name before suffix | By default, gender is assigned by analyzing data in this order: Suffix, First name, Middle name. Select this option to change the order to First name, Middle name, Suffix. Default: No. |
Capitalization | Choose capitalization style of the output. Default: UPPERCASE. |
Treat "President" as | You can treat the word "President" as either Title or Prefix. If you select Title, then "President" will be put in the OUT_PROFTITLE1 field. Default: Title. |
Treat "C O" as "C/O" | Select to interpret the string "C O" as "Care Of". Default: Yes. |
Prefix options
Add prefix if none present | Select to add a prefix such as "Mr" or "Mrs" to names that don't have one. Use the other prefix options (below) to specify the default prefix. Default: No. |
Default male prefix | If you selected Add prefix if none present, select a default male prefix from the list. Default: MR. |
Default female prefix | If you selected Add prefix if none present, select a default female prefix from the list. Default: MS. |
If multi-name use alt female prefix | Select to assign a prefix to the female name if a pair of parsed names includes a female. Default: Yes. |
Alt female prefix | If you selected If multi-name use alt female prefix, select a default female prefix. Default: MRS. |
Debug option
Output debugging information | Select this option to output three CSV files to the specified Output debug path. These files capture intermediate results of the parsing process, which can be useful for troubleshooting: name_parse_debug_symbols.csv shows the input names parsed into TOKENs. name_parse_debug_symbols2.csv shows the SYMBOLs assigned to TOKENs. name_parse_debug_symbols3.csv shows the TOKEN–SYMBOL–CLASS/PATTERN_INDEX correspondence. |
Add a parsing data source
You can customize the operation of the Name Parse macro by defining an additional parsing data source. Review Data Management's parsing technology, built-in data dictionary, and a sample parsing data source before creating your own supplemental data source.
The supplemental data source must be a Data Management DLD file with three columns:
TOKEN: The text of the token extracted from the name field.
SYMBOL: The part of the name that the token represents.
GENDER: If the token is gender specific, GENDER is M or F, otherwise blank.
SYMBOL is one (or a combination) of the following symbols:
FN first name (Aelena, Evo)
FP first name prefix (Cpt, Sir)
LN last name (Behlin, Looney)
LP last name prefix (Mc, Vander
LS last name suffix (III, Jr)
LT last name title (CEO, Trust)
Because TOKENS can be ambiguous, SYMBOL can be "overloaded" to indicate multiple possibilities. These compound symbols indicate that a token can be any one of the referenced name parts. Thus SYMBOL FNLNLP indicates a TOKEN that may be any of first name or last name or last name prefix (for example, Della or Santa). The compound symbols recognized by the macro are:
FNFP
FNFPLN
FNFPLNLT
FNFPLT
FNLN
FNLNLP
FNLNLS
FNLNLT
FNLP
FNLS
FNLT
FPLN
FPLNLT
FPLS
FPLT
LNLP
LNLS
LNLT
LPLS
Parsing data source sample
The following is a sample parsing/gender data source table:
TOKEN | SYMBOL | GENDER |
---|---|---|
A'ISHAH | FN | F |
AABRAHAM | FN | M |
AAGE | FN | |
AAGOT | FN | |
AAISHA | FN | F |
AAKARSHAN | FN | M |
AALEXUS | FN | F |
AALEYAH | FN | F |
AALI | FN | M |
AALIYAH | FN | F |
AB | FNLS | M |
ABS | FNLS | M |
BA | FNLS | M |
BJ | FNLS | |
CAS | FNLS | |
DOM | FNLS | M |
DOT | FNLS | F |
DRE | FNLS | M |
EDD | FNLS | M |
JD | FNLS | |
BR | FP | |
BROTHER | FP | |
CAPT | FP | |
CMDR | FP | |
COL | FP | |
CPT | FP | |
FATHER | FP | |
FR | FP | |
GOV | FP | |
HONORABLE | FP | |
MS | FPLS | |
CAPTAIN | FPLT | |
COLONEL | FPLT | |
COMMANDER | FPLT | |
CORPORAL | FPLT | |
DOCTOR | FPLT | |
GENERAL | FPLT | |
GOVERNOR | FPLT | |
HON | FPLT | |
MASTER | FPLT | |
MISS | FPLT | F |
AAB | LN | |
AABBOTT | LN | |
AABED | LN | |
AABEDI | LN | |
AABEDIN | LN | |
AABERG | LN | |
AABY | LN | |
AABYE | LN | |
AACH | LN | |
AACHMANN | LN | |
DEL | LNLP | M |
DELA | LNLP | |
DELLA | LNLP | F |
DER | LNLP | M |
DU | LNLP | |
EL | LNLP | F |
LA | LNLP | |
LAM | LNLP | M |
LAU | LNLP | |
LE | LNLP | F |
BARCH | LNLS | |
BCHIR | LNLS | |
BE | LNLS | |
BES | LNLS | |
BOST | LNLS | |
DARCH | LNLS | |
DAS | LNLS | |
DENG | LNLS | |
DO | LNLS | |
FOURTH | LNLS | |
ACTOR | LNLT | |
ADVOCATE | LNLT | |
AGENT | LNLT | |
ALDERMAN | LNLT | |
ARBITER | LNLT | |
ARTIST | LNLT | |
BAGGER | LNLT | |
BAILIFF | LNLT | |
BANKER | LNLT | |
BARGEMAN | LNLT | |
D' | LP | |
DA | LP | M |
DE | LP | |
DES | LP | M |
DI | LP | F |
O' | LP | |
ST | LP | |
ST. | LP | |
STA | LP | |
VAND | LP | |
11 | LS | |
111 | LS | |
2 | LS | |
2ND | LS | |
3 | LS | |
3D | LS | |
3RD | LS | |
4 | LS | |
4TH | LS | |
5TH | LS | |
1ST VICE-PRESIDENT | LT | |
1ST VP | LT | |
A.V.P. | LT | |
ACCESS PROGRAM MANAGER | LT | |
ACCOUNT EXECUTIVE | LT | |
ACCOUNT MANAGER | LT | |
ACCOUNT VICE PRESIDENT | LT | |
ACCOUNTANT | LT | |
ACCOUNTING | LT | |
ACCOUNTING MANAGER | LT | |
ABBOT | LTN | M |
ADMIN | LTN | M |
ARCHER | LTN | M |
AUTHOR | LTN | |
BAKER | LTN | |
BARBER | LTN | M |
BARD | LTN | M |
BISHOP | LTN | M |
CARVER | LTN | M |
CHANCELLOR | LTN | M |
Configure the Name Parse macro
Select the Name Parse macro icon, and then go to the Configuration tab on the Properties pane.
Specify Input data:
Select Name type.
If Name type is Full Name, specify Name field to parse.
If Name type is Parsed Name, specify some or all of Prefix, First name, Mid name, Last name, Suffix, and Postname.
Optionally, select the Name order field. This is a field indicating whether the Name field of the current record is in first name/last name ("FL") or last name/first name ("LF") order. If not specified, first name/last name is assumed.
You can optionally specify a Parsing/gender data source. This is a table in DLD format containing either two or three columns: TOKEN, SYMBOL, and (optionally) GENDER.
Specify parsing behavior.
Specify prefix options.
Optionally, select Output debugging information and specify an Output debug path.
Optionally, go to the Execution tab, and then set Web service options.