AO Match Building Block
Advanced Object (AO) Match Building Block is designed to be used when none of the existing Advanced Objects quite fit your matching logic. It provides more control for segmentation and additional data for matching. In addition to the business and/or personal names, ten elements (plus custom segmentation) can be defined in the match criteria. AO Match Building Block will use all available fields as part of the matching. Field 1 must be populated to use the AO Match Building Block.
AO Match Building Block accepts a single stream as input and produces a single output. It can be used in conjunction with other Advanced Objects. The results from multiple AO Match Building Blocks and/or other AO Matching are reconciled using AO Associate Match IDs.
If you will use the macro with Master Data Management (MDM) you must also define a unique record ID on input. If you use the macro with MDM, you may optionally define an additional input containing "Never Match" ID pairs.
AO Match Building Block configuration parameters
In addition to the standard execution options, AO Match Building Block has four sets of configuration parameters (Input tab, Match, Options, and Table tab) and up to ten additional fields for matching and segmentation.
AO Match Building Block Input tab
Business
Business name | Required. Company name used for matching. Default: Blank. |
Business name 2 | Optional. Alternate company name that can also be used for cross-field matching (for example, Lotus & IBM). Allows for different companies to be matched to either field. Default: Blank. |
Business keyword | Optional. Firm keyword to allow for matching to be qualified based on special field (for example, BCBS of MA vs. BCBS of ME) The ME/MA would be in its own field. If you specify this field, also configure the Business keyword matching score on the Match tab. Default: Blank. |
Name
Name type | Select input name type, either Full name or Parsed name. Default: Full name. |
Name | If Contact type is Full name, the name field. Default: Blank. |
First name | If Contact type is Parsed name, given name (John A Smith Jr). Default: Blank. |
Middle name | If Contact type is Parsed name, middle name (John A Smith Jr). Default: Blank. |
Last name | If Contact type is Parsed name, surname (John A Smith Jr). Default: Blank. |
Suffix | If Contact type is Parsed name, generation name (John A Smith Jr). Default: Blank. |
Other match field
Gender | Gender. Must be Male, Female, or blank (unknown or indeterminate). Default: Blank. |
Unique record ID
Record ID | Optional. Field containing the unique record ID. Required if the macro will run on MDM. Default: Blank. |
AO Match Building Block Match tab
Business matching
Business score | Match threshold for business name field after any optional business adjustments (described below) are taking into account. Default: Medium-Tight (80). |
Match nicknames | Allows for personal names in a firm to be standardized. For example, in Liz Smith Enterprises versus Elizabeth Smith Enterprises, Liz and Elizabeth would be treated as identical. Default: No. |
Initials | Recover match points if Initial matches a name (for example, J Robin Smith Inc versus Jonathon Robert Smith, Inc). Default: Treat as Similar (75). |
Acronyms | Assign acronym matches a specific score for their part of the match (for example, International Machine Parts versus IMP). Default: Treat as Similar (75). |
Abbreviations | Recover match points due to an abbreviation identified by pattern rather than known value (for example, Halbert Construction Contractors versus Hlbrt Construction Contractors). Default: Treat as Similar (75). |
Missing words | Recover match points due to missing or disjoint words (for example, Halston Construction Contractors versus Halston Contractors). Default: Treat as Similar (75). |
Word match threshold | Set minimum similarity threshold to consider any pair of words "the same." Default (61). |
Business keyword matching
Keyword score | Match threshold for business keyword field after optional business Keyword adjustments (described below) are taken into account. Default: Blank. |
Match blank keyword | Specifies method for matching blank business keyword input field. Options are: Blanks Never Match: If either or both records have blank field, they will not match. Blank vs. Blank Only: If both records have blank field, they will match. If only one is blank, they will not match. Blank vs. Non Blank Only: If only one record has blank field, they will match. If both have blank field, they will not match. Both One Blank and Both Blanks Match: Matches either case; one or both records with a blank field for Business keyword. Default: Blank vs. Blank Only. |
Match abbreviations | Allows for variations in the business keyword field (for example, MISS vs. MISSISSIPPI to be considered an exact match as a case of abbreviation). Default: No. |
Name matching
Ethnic nickname match | If selected, matches less common, but valid nicknames (such as Sean/John). Unwanted nicknames can be removed by adding a "remove" entry to the Name alias table. Default: No. |
Match gender | If selected, records with two different genders (no matter how close) will never match (for example, Alexander versus Alexandra). If a full name is used instead of parsed names or a gender field is not used, Data Management will attempt to internally generate one for matching purposes. Default: No. |
Match family | If selected, matches records on Last Name and Address. If selecting more than one match criteria, records must match on Resident to be compared as an Individual. Default: No. |
Fix reversed first/last | Select this if you suspect that your records may have First name and Last name reversed. Default: No. |
Gender reversal | Defines how gender is handled in records where First Name and Last Name are reversed. Options are Exclude: Existing gender is ignored; male records can match female records. For example, David Marie (M) would match Marie David (F). Include: Existing gender is used for matching. For example. David Marie (M) would not match Marie David (F), even though the reversed text is identical. Regenderize: The reversed First Name (formerly the Last Name) is assigned a gender. "Male" records may still match "female records," depending on the last name: Mark David vs. Mary David. Default: Exclude. |
Fix reversed first/last all recs | Select this if you selected Fix reversed first/last and you also want to fix records with an internal dedupe flag set to N. Default: No. |
Match first/middle | Select to enable cross comparison of first name against middle name. Default: No. |
Match first/initial | Select to enable cross comparison of first name against initial. Default: Yes. |
Match middle/initial | Select to enable cross comparison of the middle name against initial. Default: Yes. |
Ignore middle | Select to ignore middle name in name comparisons. Default: No. |
Females: match First only | Select to compare female records using only First Name (ignoring Last Name). Default: No. |
First name score | Match threshold for First Name. Default: Medium (74). |
Middle name score | Match threshold for Middle Name. Default: Exact (100). |
Last name score | Match threshold for Last Name. Default: Medium (74). |
AO Match Building Block Options tab
MDM
The MDM options are only available if you have defined a unique Record ID on the Input tab.
"Never Match" override | If selected, use a second input to define "never match" pairs (pairs of record IDs that should never be matched). This input must contain two fields, ID1 and ID2. Default: No. Note that this option operates at the record-comparison level, not the record-grouping level. So if you have three records with IDs {1,2,3} that all match each other, and inject "never match 1-3" using the never-match input, the records will still group due to the transitivity of matching 1-2 and 2-3. |
ID1, ID2 | If "Never Match" override is selected, the fields containing IDs for the "Never Match" pairs. Default: Blank. |
Segmentation
Segment address data by | Specifies method for defining sort and comparison minimums for address data. Options are: FIELD 1—ALL FIELD 1—PARTIAL FIELD 1—AS EMAIL—DOMAIN FIELD 1—AS EMAIL—USER NAME FIELD 1—AS PHONE—LAST 7 CUSTOM (SPECIAL FIELD) Default: ZIP |
Custom segment | Optional. If you select CUSTOM segmentation, specify the field containing the segment key. Default: Blank. |
Partial segment chars | If you select FIELD 1—PARTIAL segmentation, define the number of characters to use from the field/column. Default: 1. |
Max segment size | This value controls the maximum number of records compared in a single segment, to prevent the compare process from running forever when segmentation is poorly defined. By default this allows for nearly-unlimited segment size. If you want to limit segment size to avoid runaway computation, potentially at the expense of missing a few record matches, set this to a lower value like 1000. Typically you can reduce this value unless you are matching within a very large segment like STATE. Default: 99999. |
Match segment with same value | The Match Building Block normally excludes from matching any group with the same value (111, 222, and so on). Enable this option when using with a value when it is permissible for a sequential value to exist. Default: No. |
Data sorted by segment | Enable this option if your data is already sorted by the segment field(s). You'll improve execution speed by avoiding re-sorting the data. Note that the data is sorted lexically rather than numerically, so numeric data must have leading zeros. Default: No. |
Optional additional segment
Custom segment | Optional. If you select CUSTOM segmentation, specify the field containing the segment key. This is useful if the same type of data (i.e. Home Phone, Work Phone, Cell Phone) is in multiple fields and you want to cross-compare. Default: Blank. |
Max segment size | This value controls the maximum number of records compared in a single segment, to prevent the compare process from running forever when segmentation is poorly defined. By default this allows for nearly-unlimited segment size. If you want to limit segment size to avoid runaway computation, potentially at the expense of missing a few record matches, set this to a lower value like 1000. Typically you can reduce this value unless you are matching within a very large segment like STATE. Default: 99999. |
Data sorted by segment | Enable this option if your data is already sorted by the segment field(s). You'll improve execution speed by avoiding re-sorting the data. Note that the data is sorted lexically rather than numerically, so numeric data must have leading zeros. Default: No. |
Reporting
Output match score | If selected, outputs the overall score from the match records as a percentage between 1—100. Default: No. |
Match score | Field for match score. Default: MATCH_SCORE. |
Match ID | The Match ID (or Group ID) generated by the matching process. This defines the match groups. Default: MATCH_GROUP. |
Source control
Source | Field containing the logical description for input data source. This is usually defined in AO Define Source. Default: Blank. |
Internal dedupe flag | Field containing a Y/N flag indicating whether or not data from a particular source should be compared against itself (deduped) or solely against other sources. As a general rule, master databases are not internally deduped whereas update files are. Default: Blank. |
Compare sources not internally deduped | If a matching process has more than one source with the internal dedupe field set to "N", selecting this will compare the two sources. Default: Yes. |
Parallel processing
Parallelism level | Set to the lesser of the number of CPU cores on the Execution Server, or the number of threads configured in the project in which the macro is embedded. Default: 1. |
Optimize for large segments | If you receive warnings like "Window Compare segment size for value (06828EAS3135) has exceeded 2000," enable this option. Comparing large candidate groups may reduce matching efficiency. Selecting this option increases the number of records that can be sent to a matching process without slowing processing. Default: No. |
AO Match Building Block Table tab
Name alias table | Path and name of optional DLD table used to augment or override the alias values defined within the macro. The table must be of the form:
where ALIAS and STANDARD are Text fields and REMOVE is Boolean. In the example above, Peg and Margie are defined as new aliases for Margaret (a blank REMOVE field is treated as FALSE), while the TRUE value in the REMOVE field explicitly suppresses Jon as an alias for John. Default: Blank. | ||||||||||||
Business alias table | Path and name of optional DLD table used to augment or override the alias values defined within the macro. The table must be of the form:
where ALIAS and STANDARD are Text fields and REMOVE is Boolean. In the example above, Paving and Hotmix are defined as new aliases for Asphalt (a blank REMOVE field is treated as FALSE), while the TRUE value in the REMOVE field explicitly suppresses Reproduction as an alias for Copying. Default: Blank. | ||||||||||||
Business noise table | Path and name of optional DLD table used to add additional "noise" words for Firm matching. The table must be a single-column DLD table of the form:
where WORDS is a Text field. The example above shows data that shouldn’t contribute to the match score because it is contextually meaningless. If you are matching financial institutions, word BANK could be a "noise" word. Default: Blank. |
AO Match Building Block Fields tab
These let you specify different matching parameters for fields 1-10.
Field N matching
Field 1 | Required. Map a field for segmentation and/or to match for this macro. Default: None. |
Score | Match threshold for Field 1 field. Default: Tight (88). |
Blank matching | Specifies how blank Field 1 input field is matched. Options are: NONE: If either or both records have blank field, they will not match. BOTH: If both records have blank field, they will match. If only one is blank, they will not match. ONE: only one record has blank field, they will match. If both have blank field, they will not match. ALL: Matches either case—one or both records with a blank field for Field 1. Default: BOTH. |
Comparison kind | Specifies the field comparison method. Options are: Positional: Compares each character position within the two records. Edit Distance: Compares the fields of the two records using an algorithm that counts how many "mistakes" were made to transform one field value into the other. Word-by-Word: Compares the fields of two records by splitting the text of each field into words (punctuation and spaces are dropped) and then comparing the words one at a time. This method is often used for business-name comparison where word order is not as important. Default: Edit Distance. |
Numeric comparison options
Sensitivity | Differentiates values that are close together. For example: With Sensitivity set to 1:
With Sensitivity set to 10:
Default: 1. | |||||||||||||||||||||||||||
Zeros as blanks | Specifies that values of zero are treated as blanks for the purposes of Blank matching. Default: On. |
Positional/Word-by-Word options
Numeric threshold | If selected, lets you specify a secondary match threshold (Numeric Minimum Score) for the digits contained in the match values. A second match is performed on the digits of both values using an edit distance algorithm. The numeric score is computed by extracting all the digits, and applying the following rules: If both digit sets are blank, it is a match. If only one digit set is blank, it is a non-match. If both digit sets are non-blank, they are compared using Edit Distance Qwerty rules. If this match fails to meet the threshold, the entire match fails. Use this if your field contains both digits and non-digits, but the digits are more critical to the match. Default: No. |
Minimum Score | Numeric secondary match threshold, as described above. Default: Exact (100). |
Word-by-Word options
Initials | Recover match points if Initial matches a name (for example, J Robin Smith Inc versus Jonathon Robert Smith, Inc). Default: Treat as Different (0). |
Abbreviations | Recover match points due to an abbreviation identified by pattern rather than known value (for example, Halbert Construction Contractors versus Hlbrt Construction Contractors). Default: Treat as Different (0). |
Missing words | Recover match points due to missing or disjoint words (for example, Halston Construction Contractors versus Halston Contractors). Default: Treat as Different (0). |
Acronyms | Assign acronym matches a specific score for their part of the match (for example, International Machine Parts versus IMP). Default: Treat as Different (0). |
Word match threshold | Set minimum similarity threshold to consider any pair of words "the same." Default: Treat as Different (0). |
Configure AO Match Building Block
Select AO Match Building Block, and then select the Input tab on the Properties pane.
Specify Business, Name, and optionally Gender input fields.
Select the Match tab to edit matching options.
Select the Name tab to configure name match options and match scores.
Select the Address tab to configure address match options and match scores.
Select the Options tab to edit reporting, source control, master record, and other options.
Optionally, select the Table tab to specify alias and noise reference tables.
Select the Fields 1-2 tab, and define one or more match or segmentation fields. Repeat on the other Fields tabs to add additional fields.
Optionally, go to the Execution tab, and then set Web service options.