Splitter

Overview

The Splitter tool accepts a single input, and splits the records from that input into one or more outputs, using Data Management's multi-threading capabilities to distribute CPU-intensive tasks. A Split type setting determines how input records are distributed to the output. Each split type works slightly differently.

The counterpart of the Splitter tool is the Merge tool. After the Splitter tool splits one record stream into many, the Merge tool combines multiple streams into one. For each Splitter type, there is a companion Merge type that precisely reverses the splitting to recreate the original record order.

In this example you will only see improved performance if the computer has sufficient memory and you clear the Share CASS library option in both Standardize Address tools.

Splitter tool configuration parameters

The Splitter tool has one set of configuration parameters in addition to the standard execution options.

Parameter	Description
Split type	Specifies how the multiple inputs are to be combined, either Round robin, Greedy, or Grouped. This is optional and defaults to Round robin.
Field	Required for Grouped Split types. Specifies the fields that define the group.
Output buffer size	Applies to Grouped Split types. Controls the size of the output buffer. Must be an integer between 256 and 10240. This is optional and defaults to 256K.

Configure the Splitter tool

Select the Splitter tool, and then go to the Configuration tab on the Properties pane.
Choose the Split type from the list.
- Round-robin: records are sent to successive outputs in a circular fashion, much like a card dealer distributes cards to the players. This is a typical setting, which can be re-assembled easily using the Merge tool with merge type set to Round Robin.
- Greedy: records are sent to the next available output. Use this for parallel record processing where you want optimal performance, but don't care how the records are split.
- Grouped: records are split according to one or more group fields, so that all adjacent records with the same group field values are sent to the same output. Use this for parallel record processing where elements of a group must be kept together.
If you selected the split type Grouped:
- Select the group fields from the Field list.
- Specify the Output buffer size (an integer between 256 and 10240) or accept the default setting of 256K.
Optionally, select the Connection Order tab and adjust the order of output connections.

The Round-robin option outputs records with a round-robin split type, which can be reassembled by a downstream Merge tool configured with the round-robin option. However, the connections must be ordered identically in the Splitter and Merge tools, or the data stream will be reassembled in a different record order.

Optionally, go to the Execution tab, and then set Report options and Web service options.