AO Multi Line Classifier
Overview
Advanced Object (AO) Multi Line Classifier parses and identifies diverse data from multiple text fields ("lines"). You can select global formatting and parsing options for all lines or tailor them to the expected content of each line. The macro accepts a single stream as input and produces one output stream containing the identified fields.
AO Multi Line Classifier configuration parameters
AO Multi Line Classifier has three sets of configuration parameters in addition to the standard execution options: Configuration, Name options, and All lines/Lines.
AO Multi Line Classifier Configuration tab
Input fields
Parameter | Description |
---|---|
Line 1...8 | One or more fields containing diverse data.
|
Global options
Parameter | Description |
---|---|
Output name components | Select to parse first found name into components.
|
Output phone components | Select to parse found phone number into components.
|
Validate/Parse SSN | Select to validate first found Social Security Number and parse it into components.
|
Output unparsed | Select to output unparsed field components.
|
Use large table | Select to use Data Management's comprehensive parsing look-up table. If you are resource-limited, you should leave this off.
|
Classifier data source
You may specify an optional Classifier data source. This is a table in DLD format containing either two or three columns: TOKEN
, SYMBOL
, and (optionally) GENDER
.
Parsing/gender data source
You may specify an optional Parsing/gender data source. This is a table in DLD format containing either two or three columns: TOKEN
, SYMBOL
, and (optionally) GENDER
.
Limit output
Optionally specify the maximum number of each component that you want to limit on output. You can specify 1-16; the default setting of 0 (zero) will output two values of the selected component per input line.
Parameter | Description |
---|---|
Max names | Manually limit the number of name fields output.
|
Max firms | Manually limit the number of firm fields output.
|
Max SSNs | Manually limit the number of SSN fields output.
|
Max emails | Manually limit the number of email fields output.
|
Max phones | Manually limit the number of phone fields output.
|
Max addresses | Manually limit the number of address fields output.
|
Max unparsed | Manually limit the number of extra fields output.
|
AO Multi Line Classifier Name options tab
Parsing options
Parameter | Description |
---|---|
Prefer First/Last | For ambiguous two-name cases like "Scott Davis" and "Davis Scott", prefer Last/First interpretation over First/Last interpretation.
|
Capitalization | Choose capitalization style of the output.
|
Preserve dual LastName | If the name field might contain names with two last names, you can select this option to put both in a single last name field. If you have the name Mary Andrews Smith, selecting this option will write Andrews Smith to the
|
Split hyphenated LastName | If the name field might contain names with hyphenated last names, you can you can select this option to store hyphenated last names in separate fields. If you have a last name of Watson-Jones, selecting this option will write Watson to the
|
Parse suffix | Select this option to distinguish between generational suffixes such as Jr and III, suffixes such as DR and PhD and professional titles such as Finance Manager. An input name of the form "James Smith III, MD" will be output with "III" in the
|
No punctuation in titles | Select to remove the punctuation from honorary titles. This will strip the periods in titles like M.D.
|
Initials at name end as suffix | Select this option to extract name suffixes expressed as initials from names.
|
Genderize name before suffix | By default, the Name Parse macro assigns gender by analyzing data in this order: Prefix, Suffix, First Name, Middle Name. Select this option to change the order to Prefix, First Name, Middle Name, Suffix.
|
Treat "President" as | You can treat the word "President" as either Title or Prefix. If you select Title, then "President" will be put in the OUT_PROFTITLE1 field.
|
Treat "C O" as "C/O" | Select to interpret the string "C O" as "Care Of".
|
AO Multi Line Classifier All lines/Lines tabs
All lines tab and Lines 1...8 tabs
If the Apply to all fields option is selected, Data Management will check every input field for the selected components. If you clear the Apply to all fields option, you can use the Lines 1...8 tabs to define different parsing configurations for each input line, for up to eight lines.
Parameter | Description |
---|---|
Apply to all fields | Select this to apply the configuration defined on the All lines tab to all input fields.
|
Check for
Parameter | Description |
---|---|
Name | Check data for personal names.
|
Check for Firm | Check data for company names.
|
Check for DBA | Check data for DBA (Doing Business As) names.
|
Check for Address | Check data and for address line.
|
Check for Email | Check data for email Addresses.
|
Check for Phone | Check data for phone numbers (North American data only).
|
SSN | Check data for Social Security Numbers.
|
Default if unclassified
Parameter | Description |
---|---|
Letters only | Specifies how to categorize data that cannot be otherwise classified. Data cannot contain numbers. Options are:
The default is Blank. |
Letters and digits | Specifies how to categorize data that cannot be otherwise classified. Data may contain numbers. Options are:
The default is Blank. |
Compound (and) | Specifies how to categorize data that cannot be otherwise classified. Data may contain AND, &, OR. Options are:
The default is Blank. |
Configure AO Multi Line Classifier
Select AO Multi Line Classifier.
Go to the Configuration tab on the Properties pane.
Select Line 1 and choose the input field. Repeat for any additional input fields.
Review Global options and select desired output options. If you are not resource-limited, you may optionally select Use large table.
Optionally, you may specify one or more additional parsing data sources. See Adding a classifier data source and adding a parsing/gender data source.
Optionally, select items in the Limit output section to set the maximum number of each component type on output.
Select the Name options tab to configure name parsing options.
Select the All lines tab.
To apply the same parsing configuration to all input fields, check Apply to all input fields, and then specify the desired parsing configuration.
To specify different parsing configurations for each input field, clear Apply to all input fields, and then select the Lines 1-2 tab. Specify the desired parsing configuration for each field, and repeat on other Lines tabs for additional lines.
Optionally, go to the Execution tab, and then set Web service options.