AO Data Quality Assistant
Advanced Object (AO) Data Quality Assistant is an all-in-one macro designed to parse, correct, and match common data elements.
This macro requires that the CASS address standardization module be licensed and installed in order to run.
AO Data Quality Assistant configuration parameters
AO Data Quality Assistant has nine sets of configuration parameters: Match Criteria, Business, Name, Name Parsing Options, Address, Address Options, Additional Data, Match Info, and Segmentation.
AO Data Quality Assistant Match Criteria tab
Business matching options
Use business name in matching (if exists) | Use Business name field in matching, if available. Default: Yes. |
Use name as contact in matching (if exists) | Use Name field as Contact in matching, if available. Default: Yes. |
Consumer matching options
Individual match (Consumer) | Perform Individual match (match full name). Default: Yes. |
Family match (Consumer) | Perform Family match (match surname). Default: No. |
Resident match (Consumer) | Perform Resident match (match address). Default: No. |
Additional field options
Use URL (Business) (if exists) | Use business web address in matching, if available. Default: Yes. |
Use Phone (if exists) | Use telephone number in matching, if available. Default: Yes. |
Use Email (Consumer) (if exists) | Use consumer email in matching, if available. Default: Yes. |
AO Data Quality Assistant Business tab
Business fields
Business name | Company name used for matching. Default: Blank. |
Business name 2 | Alternate company name that can also be used for cross-field matching (for example, Lotus & IBM). Allows for different companies to be matched to either field. Default: Blank. |
Business keyword | Firm keyword to allow for matching to be qualified based on special field (for example, BCBS of MA vs. BCBS of ME) The ME/MA would be in its own field. Default: Blank. |
Business match options
Initials override | Recover match points if Initial matches a name (for example, J Robin Smith Inc versus Jonathon Robert Smith, Inc). Default: Treat as Similar (75). |
Abbreviations adjustment | Recover match points due to an abbreviation identified by pattern rather than known value (for example, Halbert Construction Contractors versus Hlbrt Construction Contractors). Default: Treat as Similar (75). |
Missing words adjustment | Recover match points due to missing or disjoint words (for example, Halston Construction Contractors versus Halston Contractors). Default: Treat as Similar (75). |
Acronym override | Assign acronym matches a specific score for their part of the match (for example, International Machine Parts versus IMP). Default: Treat as Similar (75). |
Same word if matches | Set minimum similarity threshold to consider any pair of words "the same." Default (61). |
Standardize names in firms | If selected, use AO Business Standardize to standardize business names. Default: No. |
Match PO Box/Street | If selected, allows for records where one address is a PO Box and the other is a street address to be considered a match (Y/N) when the ZIP Codes are the same. If enabled, this has an independent segmentation (ZIP Code only) rather than the one set by the Segment address data by parameter. Default: No. |
Alt firm score (PO Box match) | Alternate match threshold for business field if Match PO Box/Street is set to "Y". You may want a higher match threshold for PO Box/Street matching then when matching the same address format. This ensures tighter matches to reduce false positives when PO Box and Street are alike. Default: Blank. |
Options for business keyword matching
Blank field options for keyword qualifier | Specifies method for matching blank business keyword input field. Options are:
Default: Blank vs. Blank Only |
Consider abbreviations a match | Allows for variations in the business keyword field (for example, MISS vs. MISSISSIPPI to be considered an exact match as a case of abbreviation). Default: No |
AO Data Quality Assistant Name tab
Name
Name type | Select input name type, either Full name or Parsed name. Default: Full name. |
Name | If Contact type is Full name, the name field. Default: Blank. |
First name | If Contact type is Parsed name, given name (John A Smith Jr). Default: Blank. |
Middle name | If Contact type is Parsed name, middle name (John A Smith Jr). Default: Blank. |
Last name | If Contact type is Parsed name, surname (John A Smith Jr). Default: Blank. |
Post name | If Contact type is Parsed name, generation name (John A Smith Jr). Default: Blank. |
Gender | Gender. Must be Male, Female, or blank (unknown or indeterminate). Default: Blank. |
Name scores
First name score | Match threshold for first name. Default: Medium (74). |
Last name score | Match threshold for last name. Default: Medium (74). |
Name options
Include gender | If selected, records with two different genders (no matter how close) will never match (for example, Alexander versus Alexandra). If a full name is selected or no gender is included, Data Management will attempt to generate one. Default: No. |
Compare first name to middle name | If selected, enables cross-comparison of the first name against the middle name. Default: No. |
Consumer matching without family matching enabled
Compare first name & address — female records only | Select to compare female records using first name (ignoring last name) and address. Default: No. |
Compare first name & phone — female records only | Select to compare female records using first name (ignoring last name) and phone. Default: No. |
Compare first name & email — female records only | Select to compare female records using first name (ignoring last name) and email. Default: No. |
Compare first name only — female record for hierarchy matching | Select to compare female records using first name (ignoring last name) only. Default: No. |
Optional name settings
Alt phone / first name score | Match threshold for comparing alternate phone and first name fields. Default: Blank. |
Alt phone / last name score | Match threshold for comparing alternate phone and last name fields. Default: Blank. |
Alt email / first name score | Match threshold for comparing alternate email and first name fields. Default: Blank. |
Alt email / last name score | Match threshold for comparing alternate email and last name fields. Default: Blank. |
Matching as business contacts
Match contacts only within matched businesses | If selected, attempts to match business contact names only if they are in the same business. Default: No. |
AO Data Quality Assistant Name Parsing Options tab
Parsing and standardization options
Select casing option | Specifies how standardized names are capitalized. Options are Original, UPPERCASE, lowercase, Titlecase, and Intelligent Casing. Default: UPPERCASE. |
Treat "/" as AND | If the name field might contain two names separated by a slash ("/"), select this option to ensure that the name is parsed correctly. Default: No. |
Treat LastName FirstName as First Name Last Name | For ambiguous two-name cases like "Scott Davis" and "Davis Scott", prefer Last/First interpretation over First/Last interpretation. Default: No. |
If two last names put both in last name field | If the name field might contain names with two last names, you can select this option to put both in a single last name field. If you have the name "Mary Andrews Smith", selecting this option will write "Andrews Smith" to the OUT_LNAME1 field. If this option isn't selected, "Andrews" will be written to the OUT_MIDNAME1 field and "Smith" will be written to the OUT_LNAME1 field. Default: No. |
Store hyphenated last names in separate fields | If the name field might contain names with hyphenated last names, you can you can select this option to store hyphenated last names in separate fields. If you have a last name of "Watson-Jones", selecting this option will write "Watson" to the OUT_LNAME1 field and Jones to the OUT_LNAME1_2 field. If this option isn't selected, then "Watson-Jones" will be written to the OUT_LNAME1 field. Default: No. |
Separate JR/SR/III from honorary title and job title | Select this option to distinguish between generational suffixes such as Jr and III, suffixes such as DR and PhD and professional titles such as Finance Manager. An input name of the form "James Smith III, MD" will be output with "III" in the OUT_POSTNAME1 field and "MD" in the OUT_SUFFIX1 field. The name "Janice Jones, PhD, VP of Development" will be output with "PhD" in the OUT_SUFFIX1 field and "VP of Development" in the OUT_PROFTITLE1 field. Without this option checked, "PhD" and "VP of Development" would both go to the OUT_SUFFIX1 field. Default: Yes. |
Remove punctuation from honorary titles | Select to remove the punctuation from honorary titles. This will strip the periods in titles like "M.D." Default: No. |
Treat "President" as a title or prefix | You can treat the word "President" as either Title or Prefix. If you select Title, then "President" will be put in the OUT_PROFTITLE1 field. Default: Title. |
Prefix / last name options
Default male prefix | Select a default male prefix from the list. Default: MR. |
Default female prefix | Select a default female prefix from the list. Default: MS. |
Use alt female prefix if multi-name/gender | Select to assign a prefix to the female name if a pair of parsed names includes a female. Default: Yes. |
Alt female prefix | If you selected If multi-name use alt female prefix, select a default female prefix. Default: MRS. |
Add prefix if it does not exist | Select to add a prefix such as "Mr" or "Mrs" to names that don't have one. Use the other prefix options (below) to specify the default prefix. Default: No. |
Parsing dictionary
Add prefix if it does not exist | You may specify an optional Classifier data source. This is a table in DLD format containing either two or three columns: TOKEN, SYMBOL, and (optionally) GENDER. Default: Blank. |
Advanced name options (if already parsed)
Genderize if missing gender | Select to attempt to assign gender by analyzing name data in this order: Prefix, Suffix, First Name, Middle Name. Default: No. |
Reparse all input | Select to combine parsed input into a string and parse again. Default: No. |
AO Data Quality Assistant Address tab
Address
Address type | Specify whether input address is Full address or Parsed. Default: Full address. |
Address1 | Required if Address type is Full. First line of unparsed address. Default: Blank. |
Address2 | Optional if Address type is Full. Second line of unparsed address. Default: Blank. |
Street number | Required if Address type is Parsed. Street number (123 E Main Street NW Apt 101). Default: Blank. |
Street predir | Required if Address type is Parsed. Street predirectional (123 E Main Street NW Apt 101). Default: Blank. |
Street name | Required if Address type is Parsed. Street name (123 E Main Street NW Apt 101). Default: Blank. |
Street suffix | Required if Address type is Parsed. Street suffix (123 E Main Street NW Apt 101). Default: Blank. |
Street postdir | Required if Address type is Parsed. Street postdirectional (123 E Main Street NW Apt 101). Default: Blank. |
Sec range | Required if Address type is Parsed. Secondary range format (123 E Main Street NW Apt 101). Default: Blank |
Apt/Suite # | Required if Address type is Parsed. Suite/apartment number (123 E Main Street NW Apt 101). Default: Blank. |
Lastline
Lastline | Select Parsed if City, State and ZIP Code if are separate fields, or Lastline if city, state, and ZIP Code are contained in a single field. Default: Parsed. |
Last Line | If Lastline is Lastline, field containing city, state, and ZIP Code. Default: Lastline. |
City | If Lastline is Parsed, City. Default: Blank. |
State | If Lastline is Parsed, State. Default: Blank. |
ZIP | If Lastline is Parsed, ZIP Code. Default: Blank. |
Match score
Address score | Match threshold for address fields, set globally for all address components. Default: Tight (88). |
Optional address settings
Alt street # score | Match threshold for street Number field. Default: Blank. |
Alt street name score | Match threshold for street Name field. Default: Blank. |
Alt street predir score | Match threshold for Pre-directional field. Default: Blank. |
Alt street postdir score | Match threshold for Post-directional field. Default: Blank. |
Alt street suffix score | Match threshold for Suffix field. Default: Blank. |
Alt suite/apt # score | Match threshold for Suite/Apartment field. Default: Blank. |
Optional address options
Ignore letters in street number | If selected, 101A and 101 Main Street are treated as identical. Default: No. |
Enhance match substring on suite number | If selected, increases flexibility in matching of apt/suite # by allowing substring matches such as 10 vs. 101. Default: No. |
AO Data Quality Assistant Address Options tab
Formatting options
Casing | Controls the capitalization of CASS output data. Default: UPPER. |
Directionals format | Controls the standardization of street directionals in CASS output data. Default: Short. |
Suffix format | Controls the standardization of street suffixes in CASS output data. Default: Short. |
Secondary range format | Controls the standardization of unit designators in CASS output data. Default: Short. |
Address standardization options | Controls the output street type when an "alias" street is input. Default: Standardize. |
City standardization options | Controls the output city type when an "alias" or abbreviated city is input. Default: Standardize. |
Keep non-mailing cities | Controls the output street type when an "alias" street is input. Default: Standardize. |
Select standardization option | Specifies whether to Parse address only, Standardize address only, or Parse and standardize the address. Default: Parse and standardize. |
Geocode | Specifies whether to geocode the address, and if so, which geocode to use: Geocoder, or the CASS ZIP4 Centroid (output by the Standardize Address tool). Default: None. |
Optional address check
Check for street / PO Box in one line | If selected, checks address field for both PO Box and street information. Default: No |
Location for CASS report | Specifies path and file name for CASS address standardization report. Default: None |
Advanced address options
Address correct all input | If selected, performs address correction on all input, including previously processed data. Default: No |
Do not reparse flag address field | If specified, field that flags previously parsed records that should not be parsed again. Default: No |
AO Data Quality Assistant Additional Data tab
Additional match fields
Phone | Field containing telephone number. Default: None. |
Field containing consumer email address. Default: None. | |
URL | Field containing business URL. Default: None. |
Match scores
Phone score | Match threshold for Phone field. Default: Exact (100). |
Email score | Match threshold for Email field. Default: Exact (100). |
URL score | Match threshold for URL field. Default: Exact (100). |
Phone standardization options
Standardize format | Select the format for valid standardized phone number. Defined formats are:
Default: (XXX) XXX-XXXX. |
Convert letters to numbers | Select to convert letters included as part of the phone number into their numeric equivalent. Default: No. |
Add state based on area code? | Select to use the area code (if exists) to append a state code to the data. Default: No. |
AO Data Quality Assistant Match Info tab
Additional match information
IDs for each match criteria | If selected, output Match IDs (or Group IDs) generated by the matching process. Default: No. |
Match info | Output all, some, or no match info. Default: No. |
Match score | Output the match score. Default: No. |
Source name | The source name, typically assigned using AO Define Source. This is used to determine how many sources are involved in a match group. Note that shorter source name strings are more easily readable in the crosstab output. Default: Blank. |
Priority | Field containing Match Rank Priority value, typically assigned using AO Define Source. Determines a record's position in a match group. Default: Blank. |
Internal | Field containing a Y/N flag indicating whether or not data from a particular source should be compared against itself (deduped) or solely against other sources. As a general rule, master databases are not internally deduped whereas update files are. Default: Blank. |
Compare sources not internally deduped | If a matching process has more than one source with the internal dedupe field set to "N", selecting this will compare the two sources. Default: Yes. |
Suppression | Field containing suppression definition for a source (value should be Y or N). Default: Blank. |
ID priority tiebreaker fields
Note that values in text fields sort in alphabetical order, even if the characters are numbers. Numbers are sorted by the first digit, then by the second digit, and so on, instead of by the numeric values. Thus "12" will appear before "7". Check data types or use leading zeros ("07") to ensure correct tie-breaking. Most text fields have a limit of 100MB.
Tie-breaker field 1 | The first field used to break priority ties. Default: Blank. |
Order 1 | If ASCENDING, then lower values of Tie-breaker 1 field will have higher priority. Default: ASCENDING. |
Tie-breaker field 2 | The second field used to break priority ties. Default: Blank. |
Order 2 | If ASCENDING, then lower values of Tie-breaker 2 field will have higher priority. Default: ASCENDING. |
Tie-breaker field 3 | The third field used to break priority ties. Default: Blank. |
Order 3 | If ASCENDING, then lower values of Tie-breaker 3 field will have higher priority. Default: ASCENDING. |
Use random number for final tie-breaker | Uses random sorting as final tie-breaker. This option may generate different results for each run. If this is option is not selected, the final tie-breaker is the input record order. Default: No |
AO Data Quality Assistant Segmentation tab
Only match within segment? | If selected, data will be compared solely within the segment as defined below. Default: Yes. |
Address segmentation
Segment data by | Specifies method for defining sort and comparison minimums for address data. Options are:
Default: Default: ZIP. |
Additional consumer segmentation
Phone segment | Specifies which parts of the telephone number to use in data segmentation. Default: All. |
Email segment | Specifies which parts of the email address to use in data segmentation. Default: All. |
Parallel options
Compare processes | Set to the lesser of the number of CPU cores on the Execution Server, or the number of threads configured in the project in which the macro is embedded. Default: 1 |
Configure AO Data Quality Assistant
Select AO Data Quality Assistant, and then select the Match Criteria tab on the Properties pane.
Select the desired matching options, and then select the Business tab, Name tab, Name Parsing Options tab, and Address tab to specify input fields and additional matching and parsing options.
Optionally, select the Address Options tab, Additional Data tab, Match Info tab, and Segmentation tab to specify additional matching fields and options.
Optionally, go to the Execution tab, and then set report options and web service options. is an all-in-one macro designed to parse, correct, and match common data elements.
This macro requires that the CASS address standardization module be licensed and installed in order to run.