Avoid split-merge bottlenecks
In Data Management, a common strategy is to split the record flow using the Filter tool, perform different processing steps on the split record streams, and then merge them back together using the Merge tool. By default, Merge uses a "greedy" merge technique, which is very fast but disturbs the sort order of the data. If you merge using the "greedy" setting, you will need to re-sort the records:
The Merge tool has settings that are better-suited to this kind of processing. If your records are already sorted on some field (ID in this example), you can specify Merge type as Sorted:
Select the ID Field, and sort Order:
With a sorted merge, you can eliminate the final sort tool:
Sometimes your records aren't sorted by any fields, but you want to preserve the original record order. In this case, the Filter and Merge tools can track the record order using a sequence field.
To preserve record order using a sequence field:
Configure the Filter tool's Sequence tab to generate sequence values and append them to the records:
Next, configure the Merge tool for a sequence merge:
Optionally, insert a Select tool before the output to remove the SEQUENCE field:
Despite requiring an additional Select tool, the Sequence merge type is usually faster than the Sorted merge type. See the sample project filter_merge_sequence
for another example of this technique.