Skip to main content
Skip table of contents

Profiling

Overview

The Profiling tools perform simple profiling of single columns or entire tables.

AO Column Profiler

AO Column Profiler tests whether a specified field conforms to a user-defined set of rules. It accepts a single stream as input and produces one output.

AO Column Profiler configuration parameters

AO Column Profiler has a single set of configuration parameters in addition to the standard execution options.

Input

Parameter

Description

Input field

Input field to be profiled.

  • Default: blank

Suppress null values

If selected, null values will be ignored in processing the data.

  • Default: no

Specify test

Parameter

Description

Test type

Select the type of analysis to be performed on the selected field. The options are:

  • Range (Defined Value)

  • Range (Uses Field(s))

  • Specific Value(s)

  • Formula

  • Pattern

The default is Blank.

Invert test logic

If selected, the profiled field should deviate from the test data.

  • Default: no

Numeric profile type

When delivering the results of a numeric profile, report:

  • VALUE

  • TOTAL

  • ALL

The default is TOTAL.

Verbose test description

If selected, a full text description of the profile in addition to test metadata fields will be applied to the data.

  • Default: yes

If test type is Range

Parameter

Description

First operator

Select test to apply to first field:

  • =

  • >

  • <

  • >=

  • <=

  • !=

The default is Blank.

First column/value

Either type in a specific number or select a column name from the list.

  • Default: blank

Connector (optional, for two tests)

If a second test is desired, a connector is required to join the two tests.

  • AND

  • OR

The default is Blank.

Second operator

Select optional test to apply to first field:

  • =

  • >

  • <

  • >=

  • <=

  • !=

The default is Blank.

Second column/value

Either type in a specific number or select a column name from the list.

  • Default: blank

If test type is Specific Value

Parameter

Description

Comma delimited values

Enter a list of comma-delimited values for the test.

  • Default: blank

If test type is Formula

Parameter

Description

Operator

Select test to apply to the input field. If you select any option other than Freehand, the test takes the following format: InputField (Operator) Function. If using a freehand function, the test as typed in will be used.

  • =

  • >

  • <

  • >=

  • <=

  • !=

FREEHAND FUNCTION DEFINITION

  • Default: blank

Function

Enter a Boolean function to obtain a true or false answer. For all tests, enter the test to the right of the operator (for example, FIELD>FIELD2+"A"). For the function, enter the entire formula (for example, Field+"A">Field2+"B").

  • Default: blank

If test type is Pattern

Parameter

Description

RegExp

Enter a regular expression pattern to use for the test.

  • Default: blank

Configure AO Column Profiler

  1. Select the AO Column Profiler icon.

  2. Go to the Configuration tab on the Properties pane.

  3. Select Input field and choose the column to be profiled.

  4. Optionally, select Suppress null values to ignore null values in the input data.

  5. Select Test type and choose the type of analysis to be performed on the selected column.

    • Range (Defined Value)

    • Range (Uses Field(s))

    • Specific Value(s)

    • Formula

    • Pattern

  6. Optionally, select Invert test logic to analyze the column's deviation from the test logic, rather than its conformance.

  7. Specify how Numeric profile type results should be reported: by VALUE, TOTAL, or ALL.

  8. Optionally, select Verbose test description to append a full text description of the profile to the output.

  9. Configure the test according to the selected Test type: Range, Specific Value, Formula, or Pattern.

  10. Optionally, go to the Execution tab, and then set Web service options.

AO Column Profiler Cross Table

AO Column Profiler Cross Table compares two sources of data to look for identical or disjoint values based on the selected field(s). It accepts two streams as input and produces one output.

AO Column Profiler Cross Table configuration parameters

AO Column Profiler Cross Table has a single set of configuration parameters in addition to the standard execution options.

Data table input

Parameter

Description

Field 1 (alpha)

Alpha field from main table used for comparison against secondary table.

  • Default: blank

Field 2 (numeric)

Numeric field from main table used for comparison against secondary table.

  • Default: blank

Field 3 (alpha)

Second alpha field from main table used for comparison against secondary table.

  • Default: blank

Support table input

Parameter

Description

Field 1 (alpha)

Alpha field from secondary table used for comparison against primary table.

  • Default: blank

Field 2 (numeric)

Numeric field from secondary table used for comparison against primary table.

  • Default: blank

Field 3 (alpha)

Second alpha field from secondary table used for comparison against primary table.

  • Default: blank

Options

Parameter

Description

Numeric profile type

When delivering the results of a numeric profile, report:

  • VALUE

  • TOTAL

  • ALL

The default is TOTAL.

Invert test logic

If selected, the profiled Data field should deviate from the Support data.

  • Default: no

Verbose test description

If selected, a full text description of the profile is appended to the data.

  • Default: no

Configure AO Column Profiler Cross Table

  1. Select the AO Column Profiler Cross Table icon.

  2. Go to the Configuration tab on the Properties pane.

  3. In the Data table input section, select one or more Field controls, and choose the fields to be profiled.

  4. In the Support table input section, select one or more Field controls, and choose the fields used for comparison against data tables.

  5. Specify how Numeric profile type results should be reported: by VALUE, TOTAL, or ALL.

  6. Optionally, select Invert test logic to analyze the column's deviation from the test logic, rather than its conformance.

  7. Optionally, select Verbose test description to append a full text description of the profile to the output.

  8. Optionally, go to the Execution tab, and then set Web service options.

AO Table Profiler

This macro has been deprecated. You should use the Table Profiler tool instead.

AO Table Profiler macro to performs a quick analysis of fields. It accepts a single stream as input and produces two outputs:

  • Counts ("C") displays statistics on all fields passed into AO Table Profiler.

  • Frequency ("F") lists the most/least frequent values and/or patterns by field.

AO Table Profiler configuration parameters

AO Table Profiler has two sets of configuration parameters in addition to the standard execution options: the Configuration tab and the Options tab.

AO Table Profiler Configuration tab

Count "C" output:

Parameter

Description

Null count?

Select to display number of null fields.

  • Default: yes

Unique count?

Select to display number of unique values.

  • Default: yes

Longest size?

Select to display longest field size.

  • Default: yes

Shortest size?

Select to display shortest field size.

  • Default: yes

Min value?

Select to display minimum field value.

  • Default: yes

Max value?

Select to display maximum field value.

  • Default: yes

Display template pattern?

Select to display a pattern template of field contents.

  • Default: yes

Frequency "F" output:

Parameter

Description

Output top N values

Select to display top "N" values.

  • Default: yes

Top N values

Number. Select 1-999 to display the top "N" values. To display all values, type in 0 (Zero).

  • Default: none

Output bottom N values

Select to display bottom "N" values.

  • Default: yes

Bottom N values

Number. Select 1-999 to display the bottom "N" values. To display all values, type in 0 (Zero).

  • Default: none

Output top 10 patterns

Select to display data patterns.

  • Default: yes

Top 10 patterns

Number. Select 1-999 to display the top "N" patterns. To display all patterns, type in 0 (Zero).

  • Default: none

AO Table Profiler Options tab

Data:

Parameter

Description

Use data type

Process Date, Date/Time and Numeric field types as types rather than text.

  • Default: no

Include these types in patterns:

If selected, include fields containing these data types in Template pattern and Top N pattern outputs. You must select at least one of these options.

Parameter

Description

Numbers

Integers, Decimals, Floating points.

  • Default: yes

Date/time

Dates, Date/Times, Times.

  • Default: yes

Other

All other data types.

  • Default: yes

Include these types in min/max values:

If selected, include fields containing these data types in Count min value and Count max value outputs. You must select at least one of these options.

Parameter

Description

Numbers

Integers, Decimals, Floating points.

  • Default: yes

Date/time

Dates, Date/Times, Times.

  • Default: yes

Other

All other data types.

  • Default: yes

Include these types in longest/shortest:

If selected, include fields containing these data types in Count longest size and Count shortest size outputs. You must select at least one of these options.

Parameter

Description

Numbers

Integers, Decimals, Floating points.

  • Default: yes

Date/time

Dates, Date/Times, Times.

  • Default: yes

Other

All other data types.

  • Default: yes

Configure AO Table Profiler

This macro has been deprecated. You should use the Table Profiler tool instead.

  1. Select the AO Table Profiler icon.

  2. Go to the Configuration tab on the Properties pane.

  3. Select or clear the Count and Frequency Output options. You may optionally edit the Top N and Bottom N settings to increase or decrease the number of values or patterns that are output.

  4. Select the Options tab, and configure profiling options.

  5. Optionally, select the Execution tab, and then set Report options and web service options.

Table Profiler

The Table Profiler tool performs a quick analysis of fields. It accepts a single stream as input and produces up to three outputs:

  • Summary ("S") lists statistics on all fields passed into the Table Profiler.

  • Frequency ("F") lists the most/least frequent values and/or patterns by field.

  • If you select the Output numeric ranges option, a Range ("R") output contains an analysis of power-of-ten ranges that are occupied, and the counts within each range.

Table Profiler configuration parameters

AO Table Profiler has two sets of configuration parameters in addition to the standard execution options: Configuration and Options.

Table Profiler Configuration tab

Summary "S" output:

Parameter

Description

Null count

Select to display number of null fields.

  • Default: yes

Unique count

Select to display number of unique values.

  • Default: Yes.

Longest

Select to display longest field size.

  • Default: yes

Shortest

Select to display shortest field size.

  • Default: yes

Min

Select to display minimum field value.

  • Default: yes

Max

Select to display maximum field value.

  • Default: yes

Unique pattern count

Select to output the number of unique data patterns describing field contents.

  • Default: yes

Output template pattern?

Select to output a pattern template of field contents.

  • Default: yes

Average

Select to output average of numeric-type fields.

  • Default: yes

Mean

Select to output mean of average of numeric-type fields.

  • Default: yes

Standard deviation

Select to output standard deviation of average of numeric-type fields.

  • Default: yes

Frequency "F" output:

Parameter

Description

Output top N values

Select to display top "N" values.

  • Default: yes

Top N values

Number. Select 1-1000 to output the top "N" values.

  • Default: 10

Output bottom N values

Select to display bottom "N" values.

  • Default: yes

Bottom N values

Number. Select 1-1000 to output the bottom "N" values.

  • Default: 10

Output top 10 patterns

Select to display data patterns.

  • Default: yes

Top 10 patterns

Number. Select 1-1000 to output the top "N" values.

  • Default: 10

Output numeric ranges

Select to output an analysis of power-of-ten ranges that are populated, and the counts within each range. Only the ranges containing values are reported. The output includes MIN and MAX values for each range, as well as MIN_INCLUSIVE and MAX_INCLUSIVE flags that are true if the range includes those values. Generally, MIN is inclusive and MAX is not for positive numbers, and the reverse is true for negative numbers (except for the special cases around zero).

  • Default: no

Table Profiler Options tab

Include these types in min/max:

If selected, include fields containing these data types in Count min value and Count max value outputs. You must select at least one of these options.

Parameter

Description

Text

Text data types.

  • Default: yes

Numbers

Integers, Decimals, Floating points.

  • Default: yes

Date/time

Dates, Date/Times, Times.

  • Default: yes

Include these types in longest/shortest:

If selected, include fields containing these data types in Count longest size and Count shortest size outputs. You must select at least one of these options.

Parameter

Description

Text

Text data types.

  • Default: yes

Binary

Binary data types.

  • Default: yes

Numbers

Integers, Decimals, Floating points.

  • Default: yes

Date/time

Dates, Date/Times, Times.

  • Default: yes

Include these types in patterns:

If selected, include fields containing these data types in Template pattern and Top N pattern outputs. You must select at least one of these options.

Parameter

Description

Text

Text data types.

  • Default: yes

Numbers

Integers, Decimals, Floating points.

  • Default: yes

Date/time

Dates, Date/Times, Times.

  • Default: yes

Memory management:

Parameter

Description

High cardinality data

If input data is all high-cardinality (such as names and addresses), the default memory-estimation algorithms may allocate too much memory and slow your project. If you notice the project server using too much memory, and your data is all high-cardinality, enable this option. 

  • Default: yes

Configure the Table Profiler tool

  1. Select the Table Profiler icon.

  2. Go to the Configuration tab on the Properties pane.

  3. Select or clear the Summary and Frequency Output options. You may optionally edit the Top N and Bottom N settings to increase or decrease the number of values or patterns that are output.

  4. Optionally, select Output numeric ranges to output an analysis of power-of-ten ranges that are populated, and the counts within each range.

  5. Select the Options tab, and configure profiling options.

  6. Optionally, go to the Execution tab, and then set Web service options.

The Table Profiler tool doesn’t automatically scale when run on hardware with multiple CPU cores. However, you can achieve significant parallelism by using it in conjunction with the Column Splitter tool.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.