Column Splitter
Overview
The Column Splitter tool accepts an input and splits it vertically (along field boundaries) into two or more outputs. Each output will thus have no fields in common with any other outputs. It can be used to parallelize workloads that cannot be divided along record boundaries (such as profiling or aggregating).
Column Splitter configuration parameters
The Column Splitter tool has one set of configuration parameters in addition to the standard execution options:
Option | Description |
---|---|
N | The number of outputs to split the columns into. This must be between an integer 1 and 10. |
Split mode | The method used to split the columns. One of:
|
Number of splits | Number of data streams to be output. |
Chunk size | If Split mode is Fixed chunks, the number of columns per output. |
ID field | Optional. If there is one field (such as a record identifier) that you want to have on all outputs, select that field. |
Configure the Column Splitter tool
Select the Column Splitter tool, and then go to the Configuration tab on the Properties pane.
Select N, and specify the number of outputs to split the columns into.
Select Split mode, and select the method for splitting columns:
Chunk: adjacent columns are kept together in their original order.
Fixed chunk: outputs get groups of adjacent columns. The number of columns per output is defined by Chunk size.
Interleave: Columns are allocated to each output round-robin style.
If Split mode is Fixed chunk, you may optionally select Chunk size and specify the number of columns per output.
Optionally, select Number of splits and increase the number of output data streams from the default of 2.
Optionally, select ID field and select a field (such as a record identifier) to copy to all outputs.
Optionally, go to the Execution tab, and then set Web service options.