AO Match Building Block
Overview
Advanced Object (AO) Match Building Block is designed to be used when none of the existing Advanced Objects quite fit your matching logic. It provides more control for segmentation and additional data for matching. In addition to the business and/or personal names, ten elements (plus custom segmentation) can be defined in the match criteria. AO Match Building Block will use all available fields as part of the matching. Field 1 must be populated to use the AO Match Building Block.
AO Match Building Block accepts a single stream as input and produces a single output. It can be used in conjunction with other Advanced Objects. The results from multiple AO Match Building Blocks and/or other AO Matching are reconciled using AO Associate Match IDs.
If you will use the macro with Master Data Management (MDM) you must also define a unique record ID on input. If you use the macro with MDM, you may optionally define an additional input containing "Never Match" ID pairs.
AO Match Building Block configuration parameters
In addition to the standard execution options, AO Match Building Block has four sets of configuration parameters (Input tab, Match, Options, and Table tab) and up to ten additional fields for matching and segmentation.
AO Match Building Block Input tab
Business
Parameter | Description |
---|---|
Business name | (Required) Company name used for matching.
|
Business name 2 | (Optional) Alternate company name that can also be used for cross-field matching (for example, Lotus & IBM). Allows for different companies to be matched to either field.
|
Business keyword | (Optional) Firm keyword to allow for matching to be qualified based on special field (for example, BCBS of MA vs. BCBS of ME) The ME/MA would be in its own field.
|
Name
Parameter | Description |
---|---|
Name type | Select input name type, either Full name or Parsed name.
|
Name | If Contact type is Full name, the name field.
|
First name | If Contact type is Parsed name, given name (John A Smith Jr).
|
Middle name | If Contact type is Parsed name, middle name (John A Smith Jr).
|
Last name | If Contact type is Parsed name, surname (John A Smith Jr).
|
Suffix | If Contact type is Parsed name, generation name (John A Smith Jr).
|
Other match field
Parameter | Description |
---|---|
Gender | Gender. Must be Male, Female, or blank (unknown or indeterminate).
|
Unique record ID
Parameter | Description |
---|---|
Record ID | Optional. Field containing the unique record ID.
|
AO Match Building Block Match tab
Business matching
Field | Description |
---|---|
Business score | Match threshold for business name field after any optional business adjustments (described below) are taking into account.
|
Match nicknames | Allows for personal names in a firm to be standardized. For example, in Liz Smith Enterprises versus Elizabeth Smith Enterprises, Liz and Elizabeth would be treated as identical.
|
Initials | Recover match points if Initial matches a name (for example, J Robin Smith Inc versus Jonathon Robert Smith, Inc).
|
Acronyms | Assign acronym matches a specific score for their part of the match (for example, International Machine Parts versus IMP).
|
Abbreviations | Recover match points due to an abbreviation identified by pattern rather than known value (for example, Halbert Construction Contractors versus Hlbrt Construction Contractors).
|
Missing words | Recover match points due to missing or disjoint words (for example, Halston Construction Contractors versus Halston Contractors).
|
Word match threshold | Set minimum similarity threshold to consider any pair of words "the same."
|
Business keyword matching
Field | Description |
---|---|
Keyword score | Match threshold for business keyword field after optional business Keyword adjustments (described below) are taken into account.
|
Match blank keyword | Specifies method for matching blank business keyword input field. Options are:
|
Match abbreviations | Allows for variations in the business keyword field (for example, MISS vs. MISSISSIPPI to be considered an exact match as a case of abbreviation).
|
Name matching
Field | Description |
---|---|
Ethnic nickname match | If selected, matches less common, but valid nicknames (such as Sean/John). Unwanted nicknames can be removed by adding a "remove" entry to the Name alias table.
|
Match gender | If selected, records with two different genders (no matter how close) will never match (for example, Alexander versus Alexandra). If a full name is used instead of parsed names or a gender field is not used, Data Management will attempt to internally generate one for matching purposes.
|
Ethnic nickname match | If selected, matches records on Last Name and Address. If selecting more than one match criteria, records must match on Resident to be compared as an Individual.
|
Match gender | Select this if you suspect that your records may have First name and Last name reversed.
|
Gender reversal | Defines how gender is handled in records where First Name and Last Name are reversed. Options are:
|
Fix reversed first/last all recs | Select this if you selected Fix reversed first/last and you also want to fix records with an internal dedupe flag set to N.
|
Match first/middle | Select to enable cross comparison of first name against middle name.
|
Match first/initial | Select to enable cross comparison of first name against initial.
|
Match middle/initial | Select to enable cross comparison of the middle name against initial.
|
Ignore middle | Select to ignore middle name in name comparisons.
|
Fix reversed first/last all recs | Select to compare female records using only First Name (ignoring Last Name).
|
Match first/middle | Match threshold for First Name.
|
Match first/initial | Match threshold for Middle Name.
|
Match middle/initial | Match threshold for Last Name.
|
AO Match Building Block Options tab
MDM
The MDM options are only available if you have defined a unique Record ID on the Input tab.
Field | Description |
---|---|
"Never Match" override | If selected, use a second input to define "never match" pairs (pairs of record IDs that should never be matched). This input must contain two fields, ID1 and ID2.
This option operates at the record-comparison level, not the record-grouping level. So if you have three records with IDs {1,2,3} that all match each other, and inject "never match 1-3" using the never-match input, the records will still group due to the transitivity of matching 1-2 and 2-3. |
ID1, ID2 | If "Never Match" override is selected, the fields containing IDs for the "Never Match" pairs.
|
Segmentation
Field | Description |
---|---|
Segment address data by | Specifies method for defining sort and comparison minimums for address data. Options are:
The default is ZIP. |
Custom segment | Optional. If Segment data by is Custom, the field containing the segment key.
|
Partial segment chars | If you select FIELD 1—PARTIAL segmentation, define the number of characters to use from the field/column.
|
Max segment size | This value controls the maximum number of records compared in a single segment, to prevent the compare process from running forever when segmentation is poorly defined. By default this allows for nearly-unlimited segment size. If you want to limit segment size to avoid runaway computation, potentially at the expense of missing a few record matches, set this to a lower value like 1000. Typically you can reduce this value unless you are matching within a very large segment like STATE.
|
Match segment with same value | The Match Building Block normally excludes from matching any group with the same value (111, 222, and so on). Enable this option when using with a value when it is permissible for a sequential value to exist.
|
Data sorted by segment | Enable this option if your data is already sorted by the segment field(s). You'll improve execution speed by avoiding re-sorting the data. The data is sorted lexically rather than numerically, so numeric data must have leading zeros.
|
Optional additional segment
Field | Description |
---|---|
Custom segment | Optional. If you select CUSTOM segmentation, specify the field containing the segment key. This is useful if the same type of data (i.e. Home Phone, Work Phone, Cell Phone) is in multiple fields and you want to cross-compare.
|
Max segment size | This value controls the maximum number of records compared in a single segment, to prevent the compare process from running forever when segmentation is poorly defined. By default this allows for nearly-unlimited segment size. If you want to limit segment size to avoid runaway computation, potentially at the expense of missing a few record matches, set this to a lower value like 1000. Typically you can reduce this value unless you are matching within a very large segment like STATE.
|
Data sorted by segment | Enable this option if your data is already sorted by the segment field(s). You'll improve execution speed by avoiding re-sorting the data. The data is sorted lexically rather than numerically, so numeric data must have leading zeros.
|
Reporting
Field | Description |
---|---|
Output match score | If selected, outputs the overall score from the match records as a percentage between 1—100.
|
Match score | Field for match score.
|
Match ID | The Match ID (or Group ID) generated by the matching process. This defines the match groups.
|
Source control
Field | Description |
---|---|
Source | Field containing the logical description for input data source. This is usually defined in AO Define Source.
|
Internal dedupe flag | Field containing a Y/N flag indicating whether or not data from a particular source should be compared against itself (deduped) or solely against other sources. As a general rule, master databases are not internally deduped whereas update files are.
|
Compare sources not internally deduped | If a matching process has more than one source with the internal dedupe field set to "N", selecting this will compare the two sources.
|
Parallel processing
Field | Description |
---|---|
Parallelism level | Set to the lesser of the number of CPU cores on the Execution Server, or the number of threads configured in the project in which the macro is embedded.
|
Optimize for large segments | If you receive warnings like "Window Compare segment size for value (06828EAS3135) has exceeded 2000," enable this option. Comparing large candidate groups may reduce matching efficiency. Selecting this option increases the number of records that can be sent to a matching process without slowing processing.
|
AO Match Building Block Table tab
Field | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Name alias table | Path and name of optional DLD table used to augment or override the alias values defined within the macro. The table must be of the following form.
Where
| ||||||||||||
Business alias table | Path and name of optional DLD table used to augment or override the alias values defined within the macro. The table must be of the form below.
Where
| ||||||||||||
Business noise table | Path and name of optional DLD table used to add additional "noise" words for Firm matching. The table must be a single-column DLD table of the form below.
Where
|
AO Match Building Block Fields tab
These let you specify different matching parameters for fields 1-10.
Field N matching
Field | Description |
---|---|
Field 1 | (Required) Map a field for segmentation and/or to match for this macro.
|
Score | Match threshold for Field 1 field.
|
Blank matching | Specifies how blank Field 1 input field is matched. Options are:
|
Comparison kind | Specifies the field comparison method. Options are:
|
Numeric comparison options
Field | Description | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sensitivity | Differentiates values that are close together. For example: With Sensitivity set to 1:
With Sensitivity set to 10:
| |||||||||||||||||||||||||||
Zeros as blanks | Specifies that values of zero are treated as blanks for the purposes of Blank matching.
|
Positional/Word-by-Word options
Field | Description |
---|---|
Numeric threshold | If selected, lets you specify a secondary match threshold (Numeric Minimum Score) for the digits contained in the match values. A second match is performed on the digits of both values using an edit distance algorithm. The numeric score is computed by extracting all the digits, and applying the following rules:
If this match fails to meet the threshold, the entire match fails. Use this if your field contains both digits and non-digits, but the digits are more critical to the match.
|
Minimum Score | Numeric secondary match threshold, as described above.
|
Word-by-Word options
Field | Description |
---|---|
Initials | Recover match points if Initial matches a name (for example, J Robin Smith Inc versus Jonathon Robert Smith, Inc).
|
Abbreviations | Recover match points due to an abbreviation identified by pattern rather than known value (for example, Halbert Construction Contractors versus Hlbrt Construction Contractors).
|
Missing words | Recover match points due to missing or disjoint words (for example, Halston Construction Contractors versus Halston Contractors).
|
Acronyms | Assign acronym matches a specific score for their part of the match (for example, International Machine Parts versus IMP).
|
Word match threshold | Set minimum similarity threshold to consider any pair of words "the same."
|
Configure AO Match Building Block
Select AO Match Building Block.
Go the Input tab on the Properties pane.
Specify Business, Name, and optionally Gender input fields.
Select the Match tab to edit matching options.
Select the Name tab to configure name match options and match scores.
Select the Address tab to configure address match options and match scores.
Select the Options tab to edit reporting, source control, master record, and other options.
Optionally, select the Table tab to specify alias and noise reference tables.
Select the Fields 1-2 tab, and define one or more match or segmentation fields. Repeat on the other Fields tabs to add additional fields.
Optionally, go to the Execution tab, and then set Web service options.