Skip to main content
Skip table of contents

Profiling

The Profiling tools perform simple profiling of single columns or entire tables.

AO Column Profiler

AO Column Profiler tests whether a specified field conforms to a user-defined set of rules. It accepts a single stream as input and produces one output.

AO Column Profiler configuration parameters

AO Column Profiler has a single set of configuration parameters in addition to the standard execution options:

Input

Input field

Input field to be profiled. Default: Blank.

Suppress null values

If selected, null values will be ignored in processing the data. Default: No.

Specify test

Test type

Select the type of analysis to be performed on the selected field. The options are:

  • Range (Defined Value)

  • Range (Uses Field(s))

  • Specific Value(s)

  • Formula

  • Pattern

Default: Blank.

Invert test logic

If selected, the profiled field should deviate from the test data. Default: No.

Numeric profile type

When delivering the results of a numeric profile, report:

  • VALUE

  • TOTAL

  • ALL

Default: TOTAL.

Verbose test description

If selected, a full text description of the profile in addition to test metadata fields will be applied to the data. Yes.

If test type is Range

First operator

Select test to apply to first field:

=

>

<

>=

<=

!=

Default: Blank.

First column/value

Either type in a specific number or select a column name from the list. Default: Blank.

Connector (optional, for two tests)

If a second test is desired, a connector is required to join the two tests.

AND

OR

Default: Blank.

Second operator

Select optional test to apply to first field:

=

>

<

>=

<=

!=

Default: Blank.

Second column/value

.Either type in a specific number or select a column name from the list. Default: Blank.

If test type is Specific Value

Comma delimited values

Enter a list of comma-delimited values for the test. Default: Blank.

If test type is Formula

Operator

Select test to apply to the input field. If you select any option other than Freehand, the test takes the following format: InputField (Operator) Function. If using a freehand function, the test as typed in will be used.

=

>

<

>=

<=

!=

FREEHAND FUNCTION DEFINITION

Default: Blank.

Function

Enter a Boolean function to obtain a true or false answer. For all tests, enter the test to the right of the operator (for example, FIELD>FIELD2+"A"). For the function, enter the entire formula (for example, Field+"A">Field2+"B"). Default: Blank.

If test type is Pattern

RegExp

Enter a regular expression pattern to use for the test. Default: Blank.

Configure AO Column Profiler

  1. Select the AO Column Profiler icon, and then go to the Configuration tab on the Properties pane.

  2. Select Input field and select the column to be profiled.

  3. Optionally, select Suppress null values to ignore null values in the input data.

  4. Select Test type and select the type of analysis to be performed on the selected column.

    • Range (Defined Value)

    • Range (Uses Field(s))

    • Specific Value(s)

    • Formula

    • Pattern

  5. Optionally, select Invert test logic to analyze the column's deviation from the test logic, rather than its conformance.

  6. Specify how Numeric profile type results should be reported: by VALUE, TOTAL, or ALL.

  7. Optionally, select Verbose test description to append a full text description of the profile to the output.

  8. Configure the test according to the selected Test type: Range, Specific Value, Formula, or Pattern.

  9. Optionally, go to the Execution tab, and then set Web service options.

AO Column Profiler Cross Table

AO Column Profiler Cross Table compares two sources of data to look for identical or disjoint values based on the selected field(s). It accepts two streams as input and produces one output.

AO Column Profiler Cross Table configuration parameters

AO Column Profiler Cross Table has a single set of configuration parameters in addition to the standard execution options:

Data table input

Field 1 (alpha)

Alpha field from main table used for comparison against secondary table. Default: Blank.

Field 2 (numeric)

Numeric field from main table used for comparison against secondary table. Default: Blank.

Field 3 (alpha)

Second alpha field from main table used for comparison against secondary table. Default: Blank.

Support table input

Field 1 (alpha)

Alpha field from secondary table used for comparison against primary table. Default: Blank.

Field 2 (numeric)

Numeric field from secondary table used for comparison against primary table. Default: Blank.

Field 3 (alpha)

Second alpha field from secondary table used for comparison against primary table. Default: Blank .

Options

Numeric profile type

When delivering the results of a numeric profile, report:

  • VALUE

  • TOTAL

  • ALL

Default: TOTAL.

Invert test logic

If selected, the profiled Data field should deviate from the Support data. Default: No.

Verbose test description

If selected, a full text description of the profile is appended to the data. Default: No.

Configure AO Column Profiler Cross Table

  1. Select the AO Column Profiler Cross Table icon, and then go to the Configuration tab on the Properties pane.

  2. In the Data table input section, select one or more Field controls, and select the fields to be profiled.

  3. In the Support table input section, select one or more Field controls, and select the fields used for comparison against data tables.

  4. Specify how Numeric profile type results should be reported: by VALUE, TOTAL, or ALL.

  5. Optionally, select Invert test logic to analyze the column's deviation from the test logic, rather than its conformance.

  6. Optionally, select Verbose test description to append a full text description of the profile to the output.

  7. Optionally, go to the Execution tab, and then set Web service options.

AO Table Profiler

This macro has been deprecated. You should use the Table Profiler tool instead.

AO Table Profiler macro to performs a quick analysis of fields. It accepts a single stream as input and produces two outputs:

  • Counts ("C") displays statistics on all fields passed into AO Table Profiler.

  • Frequency ("F") lists the most/least frequent values and/or patterns by field.

AO Table Profiler configuration parameters

AO Table Profiler has two sets of configuration parameters in addition to the standard execution options: the Configuration tab and the Options tab.

AO Table Profiler Configuration tab

Count "C" output

Null count?

Select to display number of null fields. Default: Yes.

Unique count?

Select to display number of unique values. Default: Yes.

Longest size?

Select to display longest field size. Default: Yes.

Shortest size?

Select to display shortest field size. Default: Yes.

Min value?

Select to display minimum field value. Default: Yes.

Max value?

Select to display maximum field value. Default: Yes.

Display template pattern?

Select to display a pattern template of field contents. Default: Yes.

Frequency "F" output

Output top N values

Select to display top "N" values. Default: Yes.

Top N values

Number. Select 1-999 to display the top "N" values. To display all values, type in 0 (Zero). Default: None.

Output bottom N values

Select to display bottom "N" values. Default: Yes.

Bottom N values

Number. Select 1-999 to display the bottom "N" values. To display all values, type in 0 (Zero). Default: None .

Output top 10 patterns

Select to display data patterns. Default: Yes.

Top 10 patterns

Number. Select 1-999 to display the top "N" patterns. To display all patterns, type in 0 (Zero). Default: None.

AO Table Profiler Options tab

Data

Use data type

Process Date, Date/Time and Numeric field types as types rather than text. Default: No.

Include these types in patterns

If selected, include fields containing these data types in Template pattern and Top N pattern outputs. You must select at least one of these options.

Numbers

Integers, Decimals, Floating points. Default: Yes.

Date/time

Dates, Date/Times, Times. Default: Yes.

Other

All other data types. Default: Yes.

Include these types in min/max values

If selected, include fields containing these data types in Count min value and Count max value outputs. You must select at least one of these options.

Numbers

Integers, Decimals, Floating points. Default: Yes.

Date/time

Dates, Date/Times, Times. Default: Yes.

Other

All other data types. Default: Yes.

Include these types in longest/shortest

If selected, include fields containing these data types in Count longest size and Count shortest size outputs. You must select at least one of these options.

Numbers

Integers, Decimals, Floating points. Default: Yes.

Date/time

Dates, Date/Times, Times. Default: Yes.

Other

All other data types. Default: Yes.

Configure AO Table Profiler

This macro has been deprecated. You should use the Table Profiler tool instead.

  1. Select the AO Table Profiler icon, and then go to the Configuration tab on the Properties pane.

  2. Select or clear the Count and Frequency Output options. You may optionally edit the Top N and Bottom N settings to increase or decrease the number of values or patterns that are output.

  3. Select the Options tab, and configure profiling options.

  4. Optionally, select the Execution tab, and then set Report options and web service options.

Table Profiler

The Table Profiler tool performs a quick analysis of fields. It accepts a single stream as input and produces up to three outputs:

  • Summary ("S") lists statistics on all fields passed into the Table Profiler.

  • Frequency ("F") lists the most/least frequent values and/or patterns by field.

  • If you select the Output numeric ranges option, a Range ("R") output contains an analysis of power-of-ten ranges that are occupied, and the counts within each range.

Table Profiler configuration parameters

AO Table Profiler has two sets of configuration parameters in addition to the standard execution options: Configuration and Options.

Table Profiler Configuration tab

Summary "S" output

Null count

Select to display number of null fields. Default: Yes.

Unique count

Select to display number of unique values. Default: Yes.

Longest

Select to display longest field size. Default: Yes.

Shortest

Select to display shortest field size. Default: Yes.

Min

Select to display minimum field value. Default: Yes.

Max

Select to display maximum field value. Default: Yes.

Unique pattern count

Select to output the number of unique data patterns describing field contents. Default: Yes.

Output template pattern?

Select to output a pattern template of field contents. Default: Yes.

Average

Select to output average of numeric-type fields. Default: Yes.

Mean

Select to output mean of average of numeric-type fields. Default: Yes.

Standard deviation

Select to output standard deviation of average of numeric-type fields. Default: Yes.

Frequency "F" output

Output top N values

Select to display top "N" values. Default: Yes.

Top N values

Number. Select 1-1000 to output the top "N" values. Default: 10.

Output bottom N values

Select to display bottom "N" values. Default: Yes.

Bottom N values

Number. Select 1-1000 to output the bottom "N" values. Default: 10.

Output top 10 patterns

Select to display data patterns. Default: Yes.

Top 10 patterns

Number. Select 1-1000 to output the top "N" values. Default: 10.

Output numeric ranges

Select to output an analysis of power-of-ten ranges that are populated, and the counts within each range. Only the ranges containing values are reported. The output includes MIN and MAX values for each range, as well as MIN_INCLUSIVE and MAX_INCLUSIVE flags that are true if the range includes those values.  Generally, MIN is inclusive and MAX is not for positive numbers, and the reverse is true for negative numbers (except for the special cases around zero). Default: No.

Table Profiler Options tab

Include these types in min/max

If selected, include fields containing these data types in Count min value and Count max value outputs. You must select at least one of these options.

Text

Text data types. Default: Yes.

Numbers

Integers, Decimals, Floating points. Default: Yes.

Date/time

Dates, Date/Times, Times. Default: Yes.

Include these types in longest/shortest

If selected, include fields containing these data types in Count longest size and Count shortest size outputs. You must select at least one of these options.

Text

Text data types. Default: Yes.

Binary

Binary data types. Default: Yes.

Numbers

Integers, Decimals, Floating points. Default: Yes.

Date/time

Dates, Date/Times, Times. Default: Yes.

Include these types in patterns

If selected, include fields containing these data types in Template pattern and Top N pattern outputs. You must select at least one of these options.

Text

Text data types. Default: Yes.

Numbers

Integers, Decimals, Floating points. Default: Yes.

Date/time

Dates, Date/Times, Times. Default: Yes.

Memory management

High cardinality data

If input data is all high-cardinality (such as names and addresses), the default memory-estimation algorithms may allocate too much memory and slow your project. If you notice the project server using too much memory, and your data is all high-cardinality, enable this option.  Default: Yes.

Configure the Table Profiler tool

  1. Select the Table Profiler icon, and then go to the Configuration tab on the Properties pane.

  2. Select or clear the Summary and Frequency Output options. You may optionally edit the Top N and Bottom N settings to increase or decrease the number of values or patterns that are output.

  3. Optionally, select Output numeric ranges to output an analysis of power-of-ten ranges that are populated, and the counts within each range.

  4. Select the Options tab, and configure profiling options.

  5. Optionally, go to the Execution tab, and then set Web service options.

The Table Profiler tool doesn’t automatically scale when run on hardware with multiple CPU cores. However, you can achieve significant parallelism by using it in conjunction with the Column Splitter tool.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.