AO Data Quality Assistant
Overview
Advanced Object (AO) Data Quality Assistant is an all-in-one macro designed to parse, correct, and match common data elements.
This macro requires that the CASS address standardization module be licensed and installed in order to run.
AO Data Quality Assistant configuration parameters
AO Data Quality Assistant has nine sets of configuration parameters: Match Criteria, Business, Name, Name Parsing Options, Address, Address Options, Additional Data, Match Info, and Segmentation.
AO Data Quality Assistant Match Criteria tab
Business matching options
Parameter | Description |
---|---|
Use business name in matching (if exists) | Use Business name field in matching, if available.
|
Use name as contact in matching (if exists) | Use Name field as Contact in matching, if available.
|
Consumer matching options
Parameter | Description |
---|---|
Individual match (Consumer) | Perform Individual match (match full name).
|
Family match (Consumer) | Perform Family match (match surname).
|
Resident match (Consumer) | Perform Resident match (match address).
|
Additional field options
Parameter | Description |
---|---|
Use URL (Business) (if exists) | Use business web address in matching, if available.
|
Use Phone (if exists) | Use telephone number in matching, if available.
|
Use Email (Consumer) (if exists) | Use consumer email in matching, if available.
|
AO Data Quality Assistant Business tab
Business fields
Parameter | Description |
---|---|
Business name | Company name used for matching.
|
Business name 2 | Alternate company name that can also be used for cross-field matching (for example, Lotus & IBM). Allows for different companies to be matched to either field.
|
Business keyword | Firm keyword to allow for matching to be qualified based on special field (for example, BCBS of MA vs. BCBS of ME) The ME/MA would be in its own field.
|
Business match options
Parameter | Description |
---|---|
Initials override | Recover match points if Initial matches a name (for example, J Robin Smith Inc versus Jonathon Robert Smith, Inc).
|
Abbreviations adjustment | Recover match points due to an abbreviation identified by pattern rather than known value (for example, Halbert Construction Contractors versus Hlbrt Construction Contractors).
|
Missing words adjustment | Recover match points due to missing or disjoint words (for example, Halston Construction Contractors versus Halston Contractors).
|
Acronym override | Assign acronym matches a specific score for their part of the match (for example, International Machine Parts versus IMP).
|
Same word if matches | Set minimum similarity threshold to consider any pair of words "the same."
|
Standardize names in firms | If selected, use AO Business Standardize to standardize business names.
|
Match PO Box/Street | If selected, allows for records where one address is a PO Box and the other is a street address to be considered a match (Y/N) when the ZIP Codes are the same. If enabled, this has an independent segmentation (ZIP Code only) rather than the one set by the Segment address data by parameter.
|
Alt firm score (PO Box match) | Alternate match threshold for business field if Match PO Box/Street is set to "Y". You may want a higher match threshold for PO Box/Street matching then when matching the same address format. This ensures tighter matches to reduce false positives when PO Box and Street are alike.
|
Options for business keyword matching
Parameter | Description |
---|---|
Blank field options for keyword qualifier | Specifies method for matching blank business keyword input field. Options are:
|
Consider abbreviations a match | Allows for variations in the business keyword field (for example, MISS vs. MISSISSIPPI to be considered an exact match as a case of abbreviation).
|
AO Data Quality Assistant Name tab
Name
Parameter | Description |
---|---|
Name type | Select input name type, either Full name or Parsed name.
|
Name | If Contact type is Full name, the name field.
|
First name | If Contact type is Parsed name, given name (John A Smith Jr).
|
Middle name | If Contact type is Parsed name, middle name (John A Smith Jr).
|
Last name | If Contact type is Parsed name, surname (John A Smith Jr).
|
Post name | If Contact type is Parsed name, generation name (John A Smith Jr).
|
Gender | Gender. Must be Male, Female, or blank (unknown or indeterminate).
|
Name scores
Parameter | Description |
---|---|
First name score | Match threshold for first name. Default: Medium (74). |
Last name score | Match threshold for last name. Default: Medium (74). |
Name options
Parameter | Description |
---|---|
Include gender | If selected, records with two different genders (no matter how close) will never match (for example, Alexander versus Alexandra). If a full name is selected or no gender is included, Data Management will attempt to generate one.
|
Compare first name to middle name | If selected, enables cross-comparison of the first name against the middle name.
|
Consumer matching without family matching enabled
Parameter | Description |
---|---|
Compare first name & address — female records only | Select to compare female records using first name (ignoring last name) and address.
|
Compare first name & phone — female records only | Select to compare female records using first name (ignoring last name) and phone.
|
Compare first name & email — female records only | Select to compare female records using first name (ignoring last name) and email.
|
Compare first name only — female record for hierarchy matching | Select to compare female records using first name (ignoring last name) only.
|
Optional name settings
Parameter | Description |
---|---|
Alt phone / first name score | Match threshold for comparing alternate phone and first name fields.
|
Alt phone / last name score | Match threshold for comparing alternate phone and last name fields.
|
Alt email / first name score | Match threshold for comparing alternate email and first name fields.
|
Alt email / last name score | Match threshold for comparing alternate email and last name fields.
|
Matching as business contacts
Parameter | Description |
---|---|
Match contacts only within matched businesses | If selected, attempts to match business contact names only if they are in the same business.
|
AO Data Quality Assistant Name Parsing Options tab
Parsing and standardization options
Parameter | Description |
---|---|
Select casing option | Specifies how standardized names are capitalized. Options are Original, UPPERCASE, lowercase, Titlecase, and Intelligent Casing.
|
Treat "/" as AND | If the name field might contain two names separated by a slash ("/"), select this option to ensure that the name is parsed correctly.
|
Treat LastName FirstName as First Name Last Name | For ambiguous two-name cases like "Scott Davis" and "Davis Scott", prefer Last/First interpretation over First/Last interpretation.
|
If two last names put both in last name field | If the name field might contain names with two last names, you can select this option to put both in a single last name field. If you have the name "Mary Andrews Smith", selecting this option will write "Andrews Smith" to the OUT_LNAME1 field. If this option isn't selected, "Andrews" will be written to the OUT_MIDNAME1 field and "Smith" will be written to the OUT_LNAME1 field.
|
Store hyphenated last names in separate fields | If the name field might contain names with hyphenated last names, you can you can select this option to store hyphenated last names in separate fields. If you have a last name of "Watson-Jones", selecting this option will write "Watson" to the
|
Separate JR/SR/III from honorary title and job title | Select this option to distinguish between generational suffixes such as Jr and III, suffixes such as DR and PhD and professional titles such as Finance Manager. An input name of the form "James Smith III, MD" will be output with "III" in the
|
Remove punctuation from honorary titles | Select to remove the punctuation from honorary titles. This will strip the periods in titles like "M.D."
|
Treat "President" as a title or prefix | You can treat the word "President" as either Title or Prefix. If you select Title, then "President" will be put in the OUT_PROFTITLE1 field.
|
Prefix / last name options
Parameter | Description |
---|---|
Default male prefix | Select a default male prefix from the list.
|
Default female prefix | Select a default female prefix from the list.
|
Use alt female prefix if multi-name/gender | Select to assign a prefix to the female name if a pair of parsed names includes a female.
|
Alt female prefix | If you selected If multi-name use alt female prefix, select a default female prefix.
|
Add prefix if it does not exist | Select to add a prefix such as "Mr" or "Mrs" to names that don't have one. Use the other prefix options (below) to specify the default prefix.
|
Parsing dictionary
Parameter | Description |
---|---|
Add prefix if it does not exist | You may specify an optional Classifier data source. This is a table in DLD format containing either two or three columns:
|
Advanced name options (if already parsed)
Parameter | Description |
---|---|
Genderize if missing gender | Select to attempt to assign gender by analyzing name data in this order: Prefix, Suffix, First Name, Middle Name.
|
Reparse all input | Select to combine parsed input into a string and parse again.
|
AO Data Quality Assistant Address tab
Address
Parameter | Description |
---|---|
Address type | Specify whether input address is Full address or Parsed.
|
Address1 | Required if Address type is Full. First line of unparsed address.
|
Address2 | Optional if Address type is Full. Second line of unparsed address.
|
Street number | Required if Address type is parsed. Street number (123 E Main Street NW Apt 101).
|
Street predir | Required if Address type is parsed. Street predirectional (123 E Main Street NW Apt 101).
|
Street name | Required if Address type is parsed. Street name (123 E Main Street NW Apt 101).
|
Street suffix | Required if Address type is parsed. Street suffix (123 E Main Street NW Apt 101).
|
Street postdir | Required if Address type is parsed. Street postdirectional (123 E Main Street NW Apt 101).
|
Sec range | Required if Address type is Parsed. Secondary range format (123 E Main Street NW Apt 101).
|
Apt/Suite # | Required if Address type is Parsed. Suite/apartment number (123 E Main Street NW Apt 101).
|
Lastline
Parameter | Description |
---|---|
Lastline | Select Parsed if City, State and ZIP Code if are separate fields, or Lastline if city, state, and ZIP Code are contained in a single field.
|
Last Line | If Lastline is Lastline, field containing city, state, and ZIP Code.
|
City | If Lastline is Parsed, City.
|
State | If Lastline is Parsed, State.
|
ZIP | If Lastline is Parsed, ZIP Code.
|
Match score
Parameter | Description |
---|---|
Address score | Match threshold for address fields, set globally for all address components.
|
Optional address settings
Parameter | Description |
---|---|
Alt street # score | Match threshold for street Number field.
|
Alt street name score | Match threshold for street Name field.
|
Alt street predir score | Match threshold for Pre-directional field.
|
Alt street postdir score | Match threshold for Post-directional field.
|
Alt street suffix score | Match threshold for Suffix field.
|
Alt suite/apt # score | Match threshold for Suite/Apartment field.
|
Optional address options
Parameter | Description |
---|---|
Ignore letters in street number | If selected, 101A and 101 Main Street are treated as identical.
|
Enhance match substring on suite number | If selected, increases flexibility in matching of apt/suite # by allowing substring matches such as 10 vs. 101.
|
AO Data Quality Assistant Address Options tab
Formatting options
Parameter | Description |
---|---|
Casing | Controls the capitalization of CASS output data.
|
Directionals format | Controls the standardization of street directionals in CASS output data.
|
Suffix format | Controls the standardization of street suffixes in CASS output data.
|
Secondary range format | Controls the standardization of unit designators in CASS output data.
|
Address standardization options | Controls the output street type when an "alias" street is input.
|
City standardization options | Controls the output city type when an "alias" or abbreviated city is input.
|
Keep non-mailing cities | Controls the output street type when an "alias" street is input.
|
Select standardization option | Specifies whether to Parse address only, Standardize address only, or Parse and standardize the address.
|
Geocode | Specifies whether to geocode the address, and if so, which geocode to use: Geocoder, or the CASS ZIP4 Centroid (output by the Standardize Address tool).
|
Optional address check
Parameter | Description |
---|---|
Check for street / PO Box in one line | If selected, checks address field for both PO Box and street information.
|
Location for CASS report | Specifies path and file name for CASS address standardization report.
|
Advanced address options
Parameter | Description |
---|---|
Address correct all input | If selected, performs address correction on all input, including previously processed data.
|
Do not reparse flag address field | If specified, field that flags previously parsed records that should not be parsed again.
|
AO Data Quality Assistant Additional Data tab
Additional match fields
Parameter | Description |
---|---|
Phone | Field containing telephone number.
|
Field containing consumer email address.
| |
URL | Field containing business URL.
|
Match scores
Parameter | Description |
---|---|
Phone score | Match threshold for Phone field.
|
Email score | Match threshold for Email field.
|
URL score | Match threshold for URL field.
|
Phone standardization options
Parameter | Description |
---|---|
Standardize format | Select the format for valid standardized phone number. Defined formats are:
|
Convert letters to numbers | Select to convert letters included as part of the phone number into their numeric equivalent.
|
Add state based on area code? | Select to use the area code (if exists) to append a state code to the data.
|
AO Data Quality Assistant Match Info tab
Additional match information
Parameter | Description |
---|---|
IDs for each match criteria | If selected, output Match IDs (or Group IDs) generated by the matching process.
|
Match info | Output all, some, or no match info.
|
Match score | Output the match score.
|
Source name | The source name, typically assigned using AO Define Source. This is used to determine how many sources are involved in a match group. Note that shorter source name strings are more easily readable in the crosstab output.
|
Priority | Field containing Match Rank Priority value, typically assigned using AO Define Source. Determines a record's position in a match group.
|
Internal | Field containing a Y/N flag indicating whether or not data from a particular source should be compared against itself (deduped) or solely against other sources. As a general rule, master databases are not internally deduped whereas update files are.
|
Compare sources not internally deduped | If a matching process has more than one source with the internal dedupe field set to "N", selecting this will compare the two sources.
|
Suppression | Field containing suppression definition for a source (value should be Y or N).
|
ID priority tiebreaker fields
Values in text fields sort in alphabetical order, even if the characters are numbers. Numbers are sorted by the first digit, then by the second digit, and so on, instead of by the numeric values. Thus "12" will appear before "7". Check data types or use leading zeros ("07") to ensure correct tie-breaking. Most text fields have a limit of 100MB.
Parameter | Description |
---|---|
Tie-breaker field 1 | The first field used to break priority ties.
|
Order 1 | If ASCENDING, then lower values of Tie-breaker 1 field will have higher priority.
|
Tie-breaker field 2 | The second field used to break priority ties.
|
Order 2 | If ASCENDING, then lower values of Tie-breaker 2 field will have higher priority.
|
Tie-breaker field 3 | The third field used to break priority ties.
|
Order 3 | If ASCENDING, then lower values of Tie-breaker 3 field will have higher priority.
|
Use random number for final tie-breaker | Uses random sorting as final tie-breaker. This option may generate different results for each run. If this is option is not selected, the final tie-breaker is the input record order.
|
AO Data Quality Assistant Segmentation tab
Parameter | Description |
---|---|
Only match within segment? | If selected, data will be compared solely within the segment as defined below.
|
Address segmentation
Parameter | Description |
---|---|
Segment data by | Specifies method for defining sort and comparison minimums for address data. Options are:
|
Additional consumer segmentation
Parameter | Description |
---|---|
Phone segment | Specifies which parts of the telephone number to use in data segmentation.
|
Email segment | Specifies which parts of the email address to use in data segmentation.
|
Parallel options
Parameter | Description |
---|---|
Compare processes | Set to the lesser of the number of CPU cores on the Execution Server, or the number of threads configured in the project in which the macro is embedded.
|
Configure AO Data Quality Assistant
Select AO Data Quality Assistant.
Go to the Match Criteria tab on the Properties pane.
Select the desired matching options, and then choose the Business tab, Name tab, Name Parsing Options tab, and Address tab to specify input fields and additional matching and parsing options.
Optionally, select the Address Options tab, Additional Data tab, Match Info tab, and Segmentation tab to specify additional matching fields and options.
Optionally, go to the Execution tab, and then set report options and web service options. is an all-in-one macro designed to parse, correct, and match common data elements.
This macro requires that the CASS address standardization module be licensed and installed in order to run.