Stream to Table and Table to Stream
Stream to Table
The Stream to Table (StreamToTable) tool is the logical complement of Table to Stream (TableToStream). It reads data in "cell value" form where each datum has its own record, and produces a structured and typed output record. It processes an input stream containing three fields:
ID: identifies each logical record.
FIELD: is the name of the field that the data belongs to.
VALUE: is the field value expressed as text.
For example, it can accept the following input data.
ID | FIELD | VALUE |
---|---|---|
1 | NAME | FRED SMITH |
1 | ADDRESS | 123 MAIN ST |
1 | PHONE | (101)111-2222 |
2 | NAME | MARY ROGERS |
2 | ADDRESS | PO BOX 888 |
2 | PHONE | 123-456-7890 |
And produce this output data.
ID | NAME | ADDRESS | PHONE |
---|---|---|---|
1 | FRED SMITH | 123 MAIN ST | (101)111-2222 |
2 | MARY ROGERS | PO BOX 888 | 123-456-7890 |
To have the record definition mirror that of an existing data connection, connect the connection to the "M" (meta) input of the Stream to Table tool.
The Stream to Table tool, used in conjunction with TableToStream, is very useful for performing meta-operations, where the exact schema is not known until runtime. It allows operations to be performed across many or all fields. See the sample project table_to_stream_and_back for an example of how to use this tool.
Stream to Table tool configuration parameters
The Stream to Table tool has a single set of configuration parameters in addition to the standard execution options.
Parameter | Description |
---|---|
ID field | The input field containing the record ID. |
Field-name field | The input field containing the field name. |
Value field | The input field containing the field data. |
Include ID field on output | If selected, copies the input ID field value to an output field of the same name. There must not already be an output field of that name. |
Default numbers to zero instead of null | If selected, sets missing numeric field values to zero. |
Do not sort by ID field | If selected, disables sorting on ID field. The Stream to Table tool requires sorted input to work correctly, and by default sorts input by the record ID. If your input is grouped by record ID but not globally sorted by the record ID, this option may offer a minor speed improvement. However, you should generally leave this off. |
Schema from | Specifies how the output schema is defined. One of:
|
Fields | The Fields grid displays the schema definition:
|
Configure the Stream to Table tool
Connect the ID/FIELD/VALUE input stream to the left (data) connector.
Select the Stream to Table tool.
Go to the Configuration tab on the Properties pane.
Specify the ID field, Field-name field, and Value input fields.
Optionally, select Include ID field on output to copy the input ID field to an output field of the same name. If this option is enabled, there must not already be an output field of that name defined.
Optionally, select Default numbers to zero instead of null if some numeric field values may be missing, and you want those to be zero.
Optionally, you may select Do not sort by ID field if your input is grouped by ID field but not globally sorted by ID field. You should generally leave this turned off.
Select Schema from and select a method to define the output record Fields.
If you have... | Do this |
---|---|
An existing data connection with the desired schema. | Select Meta input and select Commit to show the M input, and then connect the data source to the M (meta) input. |
A CSV or TXT schema for this data already defined in the repository. | Select Use repository schema and choose the schema. |
A layout file. | Select Layout file and browse to the desired layout file. |
None of the above. | Select Defined below and use the grid to define Field names and Type manually. |
Optionally, go to the Execution tab, and then set Web service options.
Notes:
The ID field values need not be unique, sequential, or numeric.
Only adjacent runs of the same ID field value are counted as the same record. If you want noncontiguous sets to be treated globally, sort by ID first.
Missing values will result in either Null or zero values on the output record, depending on the setting of the Default numbers to zero instead of null option.
If there is duplicate ID/FIELD data, the last value prevails.
For example, if the input stream is:
ID | FIELD | VALUE |
---|---|---|
1 | NAME | FRED SMITH |
1 | NAME | JOHN BROWN |
1 | PHONE | (101)111-2222 |
The output record will be:
ID | NAME | PHONE |
---|---|---|
1 | JOHN BROWN | (101)111-2222 |
Table to Stream
The Table to Stream tool is the logical complement of Stream To Table tool. It converts each cell of an input table into a new record containing three columns:
ID: unique identifier of the record. This can either be an existing unique ID, or a unique number generated by the Table To Stream tool. You can specify this to be something other than ID.
FIELD: the name of the field.
VALUE: the value of the field, converted to text.
This tool is particularly useful for the following tasks.
Performing the same calculation on every field
Because all columns are represented identically, a single operation on the VALUE field can change all columns at once. However, you are limited to computations that make sense for the original data type. See the sample project table_to_stream_and_back for an example of how to use this tool in conjunction with Stream To Table to perform calculations on multiple columns at once.
Data profiling
The Table to Stream tool can be used with the Summarize tool to generate profiles (such as counts, null counts, top N most common values) of multiple columns simultaneously.
Table to Stream tool configuration parameters
The Table to Stream tool has a single configuration parameter in addition to the standard execution options.
Parameter | Description |
---|---|
ID field | If specified, the field to use as the record ID. If not specified, a sequential number will be generated. |
Configure the Table to Stream tool
Select the Table to Stream tool.
Go to the Configuration tab on the Properties pane.
Optionally, select ID field, and then select the field you wish to use as the record ID.
Optionally, go to the Execution tab, and then set Web service options.