AO Multi Line Classifier

Overview

Advanced Object (AO) Multi Line Classifier parses and identifies diverse data from multiple text fields ("lines"). You can select global formatting and parsing options for all lines or tailor them to the expected content of each line. The macro accepts a single stream as input and produces one output stream containing the identified fields.

AO Multi Line Classifier configuration parameters

AO Multi Line Classifier has three sets of configuration parameters in addition to the standard execution options: Configuration, Name options, and All lines/Lines.

AO Multi Line Classifier Configuration tab

Input fields

Parameter

Description

Line 1...8

One or more fields containing diverse data.

Default: none

Global options

Parameter	Description
Output name components	Select to parse first found name into components. Default: no
Output phone components	Select to parse found phone number into components. Default: no
Validate/Parse SSN	Select to validate first found Social Security Number and parse it into components. Default: no
Output unparsed	Select to output unparsed field components. Default: no
Use large table	Select to use Data Management's comprehensive parsing look-up table. If you are resource-limited, you should leave this off. Default: no

Classifier data source

You may specify an optional Classifier data source. This is a table in DLD format containing either two or three columns: TOKEN, SYMBOL, and (optionally) GENDER.

Parsing/gender data source

You may specify an optional Parsing/gender data source. This is a table in DLD format containing either two or three columns: TOKEN, SYMBOL, and (optionally) GENDER.

Limit output

Optionally specify the maximum number of each component that you want to limit on output. You can specify 1-16; the default setting of 0 (zero) will output two values of the selected component per input line.

Parameter	Description
Max names	Manually limit the number of name fields output. Default: none
Max firms	Manually limit the number of firm fields output. Default: none
Max SSNs	Manually limit the number of SSN fields output. Default: none
Max emails	Manually limit the number of email fields output. Default: none
Max phones	Manually limit the number of phone fields output. Default: none
Max addresses	Manually limit the number of address fields output. Default: none
Max unparsed	Manually limit the number of extra fields output. Default: none

AO Multi Line Classifier Name options tab

Parsing options

Parameter	Description
Prefer First/Last	For ambiguous two-name cases like "Scott Davis" and "Davis Scott", prefer Last/First interpretation over First/Last interpretation. Default: no
Capitalization	Choose capitalization style of the output. Default: original
Preserve dual LastName	If the name field might contain names with two last names, you can select this option to put both in a single last name field. If you have the name Mary Andrews Smith, selecting this option will write Andrews Smith to the `OUT_LNAME1` field. If this option isn't selected, Andrews will be written to the `OUT_MIDNAME1` field and Smith will be written to the `OUT_LNAME1` field. Default: no
Split hyphenated LastName	If the name field might contain names with hyphenated last names, you can you can select this option to store hyphenated last names in separate fields. If you have a last name of Watson-Jones, selecting this option will write Watson to the `OUT_LNAME1` field and Jones to the `OUT_LNAME1_2` field. If this option isn't selected, then Watson-Jones will be written to the `OUT_LNAME1` field. Default: no
Parse suffix	Select this option to distinguish between generational suffixes such as Jr and III, suffixes such as DR and PhD and professional titles such as Finance Manager. An input name of the form "James Smith III, MD" will be output with "III" in the `OUT_POSTNAME1` field and "MD" in the `OUT_SUFFIX1` field. The name "Janice Jones, PhD, VP of Development" will be output with "PhD" in the `OUT_SUFFIX1` field and "VP of Development" in the `OUT_PROFTITLE1` field. Without this option checked, "PhD" and "VP of Development" would both go to the `OUT_SUFFIX1` field. Default: yes
No punctuation in titles	Select to remove the punctuation from honorary titles. This will strip the periods in titles like M.D. Default: no
Initials at name end as suffix	Select this option to extract name suffixes expressed as initials from names. Default: no
Genderize name before suffix	By default, the Name Parse macro assigns gender by analyzing data in this order: Prefix, Suffix, First Name, Middle Name. Select this option to change the order to Prefix, First Name, Middle Name, Suffix. Default: no
Treat "President" as	You can treat the word "President" as either Title or Prefix. If you select Title, then "President" will be put in the OUT_PROFTITLE1 field. Default: title
Treat "C O" as "C/O"	Select to interpret the string "C O" as "Care Of". Default: yes

AO Multi Line Classifier All lines/Lines tabs

All lines tab and Lines 1...8 tabs

If the Apply to all fields option is selected, Data Management will check every input field for the selected components. If you clear the Apply to all fields option, you can use the Lines 1...8 tabs to define different parsing configurations for each input line, for up to eight lines.

Parameter

Description

Apply to all fields

Select this to apply the configuration defined on the All lines tab to all input fields.

Default: yes

Check for

Parameter	Description
Name	Check data for personal names. Default: no
Check for Firm	Check data for company names. Default: no
Check for DBA	Check data for DBA (Doing Business As) names. Default: no
Check for Address	Check data and for address line. Default: no
Check for Email	Check data for email Addresses. Default: no
Check for Phone	Check data for phone numbers (North American data only). Default: no
SSN	Check data for Social Security Numbers. Default: no

Default if unclassified

Parameter

Description

Letters only

Specifies how to categorize data that cannot be otherwise classified. Data cannot contain numbers. Options are:

Name
Firm
Address

The default is Blank.

Letters and digits

Specifies how to categorize data that cannot be otherwise classified. Data may contain numbers. Options are:

Firm
Address

The default is Blank.

Compound (and)

Specifies how to categorize data that cannot be otherwise classified. Data may contain AND, &, OR. Options are:

Name
Firm

The default is Blank.

Configure AO Multi Line Classifier

Select AO Multi Line Classifier.
Go to the Configuration tab on the Properties pane.
Select Line 1 and choose the input field. Repeat for any additional input fields.
Review Global options and select desired output options. If you are not resource-limited, you may optionally select Use large table.
Optionally, you may specify one or more additional parsing data sources. See Adding a classifier data source and adding a parsing/gender data source.
Optionally, select items in the Limit output section to set the maximum number of each component type on output.
Select the Name options tab to configure name parsing options.
Select the All lines tab.
- To apply the same parsing configuration to all input fields, check Apply to all input fields, and then specify the desired parsing configuration.
- To specify different parsing configurations for each input field, clear Apply to all input fields, and then select the Lines 1-2 tab. Specify the desired parsing configuration for each field, and repeat on other Lines tabs for additional lines.
Optionally, go to the Execution tab, and then set Web service options.