Skip to main content
Skip table of contents

Optimize based on record order

Data Management automatically applies sort optimization rules, so usually you need do nothing to optimize performance based on record order. But if you are after the highest performance, there are some cases where you can make a difference by changing your project.

The Summarize, Join, Rank, and Unique tools sort the input records as a part of their functions. If the input record order matches their sort keys, they run faster.

When processing a large volume of data with these tools, compare the record order displayed on the input connection's schema to the tool's order-sensitive properties:

Tool

Order-sensitive properties

Sort

Sort specification

Join

Join keys (ascending)

Unique

Unique-by specification

Rank

Rank field name (ascending)

Summarize

Group field names (ascending)

If tool properties don't match the input record order but you think that they should, there are several things that you can do to make them match, and thus run faster:

First, realize that most tools output newly-created or edited fields in no particular order. The following tools are the exceptions:

Tool

Ordered output

Generate Sequence

ID field, if Repeat is not selected

Number Records

Record ID field and Group/Member ID fields

Sort

Sort specification fields

Unique

Unique-by fields

Join

Join key fields, as altered by selection and renaming

Summarize

Group field names

Unique

Unique-by fields

Rank

Rank field name

Token Creation

ID field, if defined

Table to Stream

ID field, if defined

Pair to Group

Order output (key or group)

In all of the cases below, Data Management loses record order information.

  • Input data: Data Management cannot infer sort order of records being read from files and databases, except for DLD files.

  • Calculate tool: Fields assigned in the Calculate tool lose all record order information.

  • Change Field Types tool: Fields changed in the Change Field Types tool lose all record order information except for some special cases.

If you know that your data is sorted in a specific record order, and you want to rely on that order to speed things up, you can use a Validate Order tool to signal downstream tools that the data is in the specified order. In addition, the following techniques may help to keep or create record orders:

  • Order of operations: If you are using multiple sorting tools in sequence (for example, a Join followed by another Join, or a Join followed by a Summarize), you may be able to reorder these tools to match the record order present on input while retaining the processing logic.

  • Generate Sequence tool: The ID field created by the Generate Sequence tool will not be ordered if the Repeat after end value is reached option is selected. Do not enable this option unless you really need it.

  • Merge tool: Where possible, preserve a known record order by using a Sequence or Sorted merge type.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.