Use Validate Order tool to optimize performance
Data Management tracks which data flows have a known sort order, and then uses that information to avoid re-sorting data that is already sorted. For example, if you sort data by a field named ID, and then connect the output of the Sort tool to an input of a Join tool that is joining by the ID field, the Join tool "knows" not to sort the data again. This is true for all tools that must sort their input. However, there are some cases where data is sorted, but Data Management doesn’t “know” about it. For example:
Data is read from a CSV Input tool that is sorted on one or more fields.
The Calculate tool changes the field on which data is sorted in a way that doesn’t affect the order (such as adding 1 to all values).
You change the type of a field in a way that maintains order (for example, changing Date to DateTime, or Integer to Decimal).
View the the Schema tab for a connection to see if Data Management shows the data as sorted. If you believe that your data is already in a specific order, but the schema viewer does not display it in that order, use the Validate Order tool. This tools lets you assert an ordering on the data without sorting it. It will verify that the order you specify is true, and add the sort meta-data to the output connector so that downstream tools may perform optimizations. For example, consider this project.
The Number Records tool sorts the data by ID. However, after dividing all ID values by 10 in the Calculate tool, Data Management it no longer “knows” that the data is sorted, even though it really is, so the Summarize tool will re-sort the data.
Adding a Validate Order tool after the Calculate tool and configuring it with the ID field verifies that the data is sorted, and the downstream Summarize tool will not re-sort the data.