Column Splitter

Overview

The Column Splitter tool accepts an input and splits it vertically (along field boundaries) into two or more outputs. Each output will thus have no fields in common with any other outputs. It can be used to parallelize workloads that cannot be divided along record boundaries (such as profiling or aggregating).

Column Splitter configuration parameters

The Column Splitter tool has one set of configuration parameters in addition to the standard execution options:

Option	Description
N	The number of outputs to split the columns into. This must be between an integer 1 and 10.
Split mode	The method used to split the columns. One of: Chunk: adjacent columns are kept together in their original order. This makes it easy to put results back together in their original order. Fixed chunks: outputs get groups of adjacent columns. The number of columns per output is defined by Chunk size. Note that some outputs may get fewer columns than desired (if you run out of columns,) and some columns may be omitted (if you run out of outputs). Interleave: columns are allocated to each output round-robin style. This can be used to achieve better output balance if large fields are clustered together.
Number of splits	Number of data streams to be output.
Chunk size	If Split mode is Fixed chunks, the number of columns per output.
ID field	Optional. If there is one field (such as a record identifier) that you want to have on all outputs, select that field.

Configure the Column Splitter tool

Select the Column Splitter tool, and then go to the Configuration tab on the Properties pane.
Select N, and specify the number of outputs to split the columns into.
Select Split mode, and select the method for splitting columns:
- Chunk: adjacent columns are kept together in their original order.
- Fixed chunk: outputs get groups of adjacent columns. The number of columns per output is defined by Chunk size.
- Interleave: Columns are allocated to each output round-robin style.
If Split mode is Fixed chunk, you may optionally select Chunk size and specify the number of columns per output.
Optionally, select Number of splits and increase the number of output data streams from the default of 2.
Optionally, select ID field and select a field (such as a record identifier) to copy to all outputs.
Optionally, go to the Execution tab, and then set Web service options.