Profiling
Overview
The Profiling tools perform simple profiling of single columns or entire tables.
AO Column Profiler
AO Column Profiler tests whether a specified field conforms to a user-defined set of rules. It accepts a single stream as input and produces one output.
AO Column Profiler configuration parameters
AO Column Profiler has a single set of configuration parameters in addition to the standard execution options.
Input
Parameter | Description |
---|---|
Input field | Input field to be profiled.
|
Suppress null values | If selected, null values will be ignored in processing the data.
|
Specify test
Parameter | Description |
---|---|
Test type | Select the type of analysis to be performed on the selected field. The options are:
The default is Blank. |
Invert test logic | If selected, the profiled field should deviate from the test data.
|
Numeric profile type | When delivering the results of a numeric profile, report:
The default is TOTAL. |
Verbose test description | If selected, a full text description of the profile in addition to test metadata fields will be applied to the data.
|
If test type is Range
Parameter | Description |
---|---|
First operator | Select test to apply to first field:
The default is Blank. |
First column/value | Either type in a specific number or select a column name from the list.
|
Connector (optional, for two tests) | If a second test is desired, a connector is required to join the two tests.
The default is Blank. |
Second operator | Select optional test to apply to first field:
The default is Blank. |
Second column/value | Either type in a specific number or select a column name from the list.
|
If test type is Specific Value
Parameter | Description |
---|---|
Comma delimited values | Enter a list of comma-delimited values for the test.
|
If test type is Formula
Parameter | Description |
---|---|
Operator | Select test to apply to the input field. If you select any option other than Freehand, the test takes the following format: InputField (Operator) Function. If using a freehand function, the test as typed in will be used.
|
Function | Enter a Boolean function to obtain a true or false answer. For all tests, enter the test to the right of the operator (for example, FIELD>FIELD2+"A"). For the function, enter the entire formula (for example, Field+"A">Field2+"B").
|
If test type is Pattern
Parameter | Description |
---|---|
RegExp | Enter a regular expression pattern to use for the test.
|
Configure AO Column Profiler
Select the AO Column Profiler icon.
Go to the Configuration tab on the Properties pane.
Select Input field and choose the column to be profiled.
Optionally, select Suppress null values to ignore null values in the input data.
Select Test type and choose the type of analysis to be performed on the selected column.
Range (Defined Value)
Range (Uses Field(s))
Specific Value(s)
Formula
Pattern
Optionally, select Invert test logic to analyze the column's deviation from the test logic, rather than its conformance.
Specify how Numeric profile type results should be reported: by
VALUE
,TOTAL
, orALL
.Optionally, select Verbose test description to append a full text description of the profile to the output.
Configure the test according to the selected Test type: Range, Specific Value, Formula, or Pattern.
Optionally, go to the Execution tab, and then set Web service options.
AO Column Profiler Cross Table
AO Column Profiler Cross Table compares two sources of data to look for identical or disjoint values based on the selected field(s). It accepts two streams as input and produces one output.
AO Column Profiler Cross Table configuration parameters
AO Column Profiler Cross Table has a single set of configuration parameters in addition to the standard execution options.
Data table input
Parameter | Description |
---|---|
Field 1 (alpha) | Alpha field from main table used for comparison against secondary table.
|
Field 2 (numeric) | Numeric field from main table used for comparison against secondary table.
|
Field 3 (alpha) | Second alpha field from main table used for comparison against secondary table.
|
Support table input
Parameter | Description |
---|---|
Field 1 (alpha) | Alpha field from secondary table used for comparison against primary table.
|
Field 2 (numeric) | Numeric field from secondary table used for comparison against primary table.
|
Field 3 (alpha) | Second alpha field from secondary table used for comparison against primary table.
|
Options
Parameter | Description |
---|---|
Numeric profile type | When delivering the results of a numeric profile, report:
The default is |
Invert test logic | If selected, the profiled Data field should deviate from the Support data.
|
Verbose test description | If selected, a full text description of the profile is appended to the data.
|
Configure AO Column Profiler Cross Table
Select the AO Column Profiler Cross Table icon.
Go to the Configuration tab on the Properties pane.
In the Data table input section, select one or more Field controls, and choose the fields to be profiled.
In the Support table input section, select one or more Field controls, and choose the fields used for comparison against data tables.
Specify how Numeric profile type results should be reported: by
VALUE
,TOTAL
, orALL
.Optionally, select Invert test logic to analyze the column's deviation from the test logic, rather than its conformance.
Optionally, select Verbose test description to append a full text description of the profile to the output.
Optionally, go to the Execution tab, and then set Web service options.
AO Table Profiler
This macro has been deprecated. You should use the Table Profiler tool instead.
AO Table Profiler macro to performs a quick analysis of fields. It accepts a single stream as input and produces two outputs:
Counts ("C") displays statistics on all fields passed into AO Table Profiler.
Frequency ("F") lists the most/least frequent values and/or patterns by field.
AO Table Profiler configuration parameters
AO Table Profiler has two sets of configuration parameters in addition to the standard execution options: the Configuration tab and the Options tab.
AO Table Profiler Configuration tab
Count "C" output:
Parameter | Description |
---|---|
Null count? | Select to display number of null fields.
|
Unique count? | Select to display number of unique values.
|
Longest size? | Select to display longest field size.
|
Shortest size? | Select to display shortest field size.
|
Min value? | Select to display minimum field value.
|
Max value? | Select to display maximum field value.
|
Display template pattern? | Select to display a pattern template of field contents.
|
Frequency "F" output:
Parameter | Description |
---|---|
Output top N values | Select to display top "N" values.
|
Top N values | Number. Select 1-999 to display the top "N" values. To display all values, type in 0 (Zero).
|
Output bottom N values | Select to display bottom "N" values.
|
Bottom N values | Number. Select 1-999 to display the bottom "N" values. To display all values, type in 0 (Zero).
|
Output top 10 patterns | Select to display data patterns.
|
Top 10 patterns | Number. Select 1-999 to display the top "N" patterns. To display all patterns, type in 0 (Zero).
|
AO Table Profiler Options tab
Data:
Parameter | Description |
---|---|
Use data type | Process Date, Date/Time and Numeric field types as types rather than text.
|
Include these types in patterns:
If selected, include fields containing these data types in Template pattern and Top N pattern outputs. You must select at least one of these options.
Parameter | Description |
---|---|
Numbers | Integers, Decimals, Floating points.
|
Date/time | Dates, Date/Times, Times.
|
Other | All other data types.
|
Include these types in min/max values:
If selected, include fields containing these data types in Count min value and Count max value outputs. You must select at least one of these options.
Parameter | Description |
---|---|
Numbers | Integers, Decimals, Floating points.
|
Date/time | Dates, Date/Times, Times.
|
Other | All other data types.
|
Include these types in longest/shortest:
If selected, include fields containing these data types in Count longest size and Count shortest size outputs. You must select at least one of these options.
Parameter | Description |
---|---|
Numbers | Integers, Decimals, Floating points.
|
Date/time | Dates, Date/Times, Times.
|
Other | All other data types.
|
Configure AO Table Profiler
This macro has been deprecated. You should use the Table Profiler tool instead.
Select the AO Table Profiler icon.
Go to the Configuration tab on the Properties pane.
Select or clear the Count and Frequency Output options. You may optionally edit the Top N and Bottom N settings to increase or decrease the number of values or patterns that are output.
Select the Options tab, and configure profiling options.
Optionally, select the Execution tab, and then set Report options and web service options.
Table Profiler
The Table Profiler tool performs a quick analysis of fields. It accepts a single stream as input and produces up to three outputs:
Summary ("S") lists statistics on all fields passed into the Table Profiler.
Frequency ("F") lists the most/least frequent values and/or patterns by field.
If you select the Output numeric ranges option, a Range ("R") output contains an analysis of power-of-ten ranges that are occupied, and the counts within each range.
Table Profiler configuration parameters
AO Table Profiler has two sets of configuration parameters in addition to the standard execution options: Configuration and Options.
Table Profiler Configuration tab
Summary "S" output:
Parameter | Description |
---|---|
Null count | Select to display number of null fields.
|
Unique count | Select to display number of unique values.
|
Longest | Select to display longest field size.
|
Shortest | Select to display shortest field size.
|
Min | Select to display minimum field value.
|
Max | Select to display maximum field value.
|
Unique pattern count | Select to output the number of unique data patterns describing field contents.
|
Output template pattern? | Select to output a pattern template of field contents.
|
Average | Select to output average of numeric-type fields.
|
Mean | Select to output mean of average of numeric-type fields.
|
Standard deviation | Select to output standard deviation of average of numeric-type fields.
|
Frequency "F" output:
Parameter | Description |
---|---|
Output top N values | Select to display top "N" values.
|
Top N values | Number. Select 1-1000 to output the top "N" values.
|
Output bottom N values | Select to display bottom "N" values.
|
Bottom N values | Number. Select 1-1000 to output the bottom "N" values.
|
Output top 10 patterns | Select to display data patterns.
|
Top 10 patterns | Number. Select 1-1000 to output the top "N" values.
|
Output numeric ranges | Select to output an analysis of power-of-ten ranges that are populated, and the counts within each range. Only the ranges containing values are reported. The output includes
|
Table Profiler Options tab
Include these types in min/max:
If selected, include fields containing these data types in Count min value and Count max value outputs. You must select at least one of these options.
Parameter | Description |
---|---|
Text | Text data types.
|
Numbers | Integers, Decimals, Floating points.
|
Date/time | Dates, Date/Times, Times.
|
Include these types in longest/shortest:
If selected, include fields containing these data types in Count longest size and Count shortest size outputs. You must select at least one of these options.
Parameter | Description |
---|---|
Text | Text data types.
|
Binary | Binary data types.
|
Numbers | Integers, Decimals, Floating points.
|
Date/time | Dates, Date/Times, Times.
|
Include these types in patterns:
If selected, include fields containing these data types in Template pattern and Top N pattern outputs. You must select at least one of these options.
Parameter | Description |
---|---|
Text | Text data types.
|
Numbers | Integers, Decimals, Floating points.
|
Date/time | Dates, Date/Times, Times.
|
Memory management:
Parameter | Description |
---|---|
High cardinality data | If input data is all high-cardinality (such as names and addresses), the default memory-estimation algorithms may allocate too much memory and slow your project. If you notice the project server using too much memory, and your data is all high-cardinality, enable this option.
|
Configure the Table Profiler tool
Select the Table Profiler icon.
Go to the Configuration tab on the Properties pane.
Select or clear the Summary and Frequency Output options. You may optionally edit the Top N and Bottom N settings to increase or decrease the number of values or patterns that are output.
Optionally, select Output numeric ranges to output an analysis of power-of-ten ranges that are populated, and the counts within each range.
Select the Options tab, and configure profiling options.
Optionally, go to the Execution tab, and then set Web service options.
The Table Profiler tool doesn’t automatically scale when run on hardware with multiple CPU cores. However, you can achieve significant parallelism by using it in conjunction with the Column Splitter tool.