AO Classifier
Overview
Advanced Object (AO) Classifier extracts undefined data of one or more types from a single field. You can select the type(s) of data to be found in a given field. AO Classifier accepts a single stream as input and produces one output stream containing the identified fields.
Add a classifier data source
You can customize the operation of AO Classifier and AO Multi Line Classifier by defining an additional parsing data source. Review Data Management's parsing technology, built-in data dictionary, and a sample parsing data source before creating your own supplemental data source.
The supplemental data source must be a Data Management DLD file with three columns:
TOKEN: the text of the token extracted from the name field.
SYMBOL: the part of the name that the token represents.
GENDER: if the token is gender specific, GENDER is M or F, otherwise blank.
SYMBOL is one (or a combination) of the following symbols:
FN: first name (Aelena, Evo)
FP: first name prefix (Cpt, Sir)
LN: last name (Behlin, Looney)
LP: last name prefix (Mc, Vander)
LS: last name suffix (III, Jr)
LT: last name title (CEO, Trust)
FI: firm indicator (Company, Corporation, Incorporated)
WD: word (suppress word if it appears in Data Management parsing dictionaries)
Because TOKENS
can be ambiguous, some SYMBOL
s can be "overloaded" to indicate multiple possibilities. These compound symbols indicate that a token can be any one of the referenced name parts. Thus SYMBOL
FNLNLP
indicates a TOKEN
that may be any of first name or last name or last name prefix (for example, Della or Santa).
The SYMBOL
s FI (firm indicator) and WD (word) must be used singularly.
The compound symbols recognized by the macro are:
FNFP
FNFPLN
FNFPLNLT
FNFPLT
FNLN
FNLNLP
FNLNLS
FNLNLT
FNLP
FNLS
FNLT
FPLN
FPLNLT
FPLS
FPLT
LNLP
LNLS
LNLT
LPLS
AO Classifier configuration parameters
AO Classifier has two sets of configuration parameters in addition to the standard execution options: Fields, and Options.
AO Classifier Fields tab
Select input field
Parameter | Description |
---|---|
Input field | Input field to identify the contents of the data.
|
Select output fields
Parameter | Description |
---|---|
Name | Select to output full name field.
|
Name1 components | Select to parse first found name into components.
|
Name2 components | Select to parse second found name into components.
|
Firm | Select to output company name field.
|
DBA | Select to output any DBA (Doing Business As) names.
|
Address | Select to output address line.
|
Email1 | Select to output first found email address.
|
Email2 | Select to output second found email address.
|
Phone1 | Select to output first found phone number.
|
Phone1 components | Select to parse and validate first found phone number into components.
|
Phone2 | Select to output second found phone number.
|
Phone2 component | Select to parse and validate second found phone number into components.
|
SSN1 | Select to output first found Social Security Number.
|
SSN1 components/validation | Select to parse first found Social Security Number into components: Area, Group, Sequence. Also outputs a Valid SSN Flag.
|
SSN2 | Select to output second found Social Security Number.
|
SSN1 components/validation | Select to parse second found Social Security Number into components: Area, Group, Sequence. Also outputs a Valid SSN Flag.
|
Other | Select to output unidentified data.
|
AO Classifier Options tab
Default if unclassified
Parameter | Description |
---|---|
Letters only | Specifies how to categorize data that cannot be otherwise classified. Data cannot contain numbers. Options are:
The default is Blank. |
Letters and digits | Specifies how to categorize data that cannot be otherwise classified. Data may contain numbers. Options are:
The default is Blank. |
Compound (and) | Specifies how to categorize data that cannot be otherwise classified. Data may contain AND, &, OR. Options are:
The default is Blank. |
Classifier data source
You may specify an optional Classifier data source. This is a table in DLD format containing either two or three columns: TOKEN
, SYMBOL
, and (optionally) GENDER
.
Parsing/gender data source
You may specify an optional Parsing/gender data source. This is a table in DLD format containing either two or three columns: TOKEN
, SYMBOL
, and (optionally) GENDER
.
Parsing behavior
Parameter | Description |
---|---|
Use large table | Select to use Data Management's comprehensive parsing lookup table. If you are resource-limited, you should leave this off.
|
Treat "/" as AND | If the name field might contain two names separated by a slash ("/"), select this option to ensure that the name is parsed correctly.
|
Fix reversed First/Last | Select this if you suspect that your records may have First name and Last name reversed.
|
Preserve dual LastName | If the name field might contain names with two last names, you can select this option to put both in a single last name field. If you have the name Mary Andrews Smith, selecting this option will write Andrews Smith to the
|
Split hyphenated LastNames | If the name field might contain names with hyphenated last names, you can you can select this option to store hyphenated last names in separate fields. If you have a last name of "Watson-Jones", selecting this option will write "Watson" to the
|
Parse suffix types | Select this option to distinguish between generational suffixes such as Jr and III, suffixes such as DR and PhD and professional titles such as Finance Manager. An input name of the form "James Smith III, MD" will be output with "III" in the
|
No punctuation in titles | Select to remove the punctuation from honorary titles. This will strip the periods in titles like M.D.
|
Capitalization | Choose capitalization style of the output.
|
Treat "President" as | You can treat the word "President" as either Title or Prefix. If you select Title, then "President" will be put in the
|
Treat "C O" as "C/O" | Select to interpret the string "C O" as "Care Of".
|
Prefix options
Parameter | Description |
---|---|
Add prefix if none present | Select to add a prefix such as "Mr" or "Mrs" to names that don't have one. Use the other prefix options (below) to specify the default prefix.
|
Default male prefix | Select a default male prefix from the list.
|
Default female prefix | Select a default female prefix from the list.
|
If multi-name use alt female prefix | Select to assign a prefix to the female name if a pair of parsed names includes a female.
|
Alt female prefix | If you selected If multi-name use alt female prefix, select a default female prefix.
|
Configure AO Classifier
Select AO Classifier.
Go to the Fields tab on the Properties pane.
Select Input field and choose the field to parse.
In the Select output fields section, choose each field that you want to include on output.
Select the Options tab, and then configure parsing behavior.
Optionally, you may specify one or more additional parsing data sources. See Adding a classifier data source and adding a parsing/gender data source.
Optionally, go to the Execution tab, and then set Web service options.