Blob Input/Output
Blob Input
The Blob Input tool reads all files matching a path with a wildcard pattern, and transforms them into a series of fields of Binary type (blobs). The tool writes two output fields: FILENAME
and CONTENTS
. This is useful for transporting data contained in a set of files, such as sending it to a web service.
Blob Input tool configuration parameters
The Blob Input tool has a single set of configuration parameters in addition to the standard execution options.
Parameter | Description |
---|---|
Input file | A full or relative path with a wildcard-pattern indicating which files to read. It may contain wildcards in any position. Example: |
File splitting | Determines how the input data is split:
|
Max bytes per blob | The largest allowable blob, in bytes. The minimum value is 4096. The maximum value corresponds to the maximum binary field size (currently 10MB). |
Line count | If File splitting is Line count, the maximum number of lines that may be placed into a single blob. |
Produce file name field | If selected, the file name will be output as a record field. |
Output full path | If Produce file name field is selected, optionally outputs the entire path to the record file name field. |
Output URI path | If Output full path is selected, express path as a Uniform Resource Identifier (URI). |
Field name | If Produce file name field is selected, name of the column to be used for the file name. This is optional and defaults to |
Field size | If Produce file name field is selected, size of the field to be used for the file name. This is optional and defaults to |
Configure the Blob Input tool
Select the Blob Input tool.
Go to the Configuration tab.
Specify a full or relative path with a wildcard-pattern indicating which files to read.
Optionally, specify File splitting:
Don't split: complete files are read and sent to binary fields. A file that is larger than Max bytes per blob is skipped and a warning issued.
Byte size: files are split when they reach Max bytes per blob.
Line count: files are split at end-of-line markers (which are included in the data). The split point occurs when either Max bytes per blob or Line count is reached, whichever comes first.
If File splitting is set to Line count or Byte size, you may specify Max bytes per blob or accept the default setting.
If File splitting is set to Line count, you may specify Max lines per blob.
To include the name of the input file as a new field, you may select Produce file name field and specify a Field name and Field size. Select Output full path to include the complete file specification. This can be useful when reading a wildcarded set of files. Select Output URI path to express the complete file specification as a Uniform Resource Identifier.
Optionally, go to the Execution tab and Enable trigger input, configure reporting options, or set Web service options.
The Blob Input tool produces Binary data. If the input data is text, you can use a Calculate tool downstream of the Blob Input tool and assign the function DecodeTextBytes(binary_field, "code_page_name") to a new text field
Blob Output
The Blob Output tool accepts an input with two fields, FILENAME
and CONTENTS
, and writes these to a series of output files.
Blob Output tool configuration parameters
The Blob Output tool has a single set of configuration parameters in addition to the standard execution options.
Parameter | Description |
---|---|
File name | The name of the input field containing a full or relative file path to which data will be written. |
Contents | The name of the input field containing contents to be written to the file. The field must be of one of the following types:
|
Combining mode | If specified, how to create a single file from consecutive blobs that have the same value in the File name field. Options are:
|
Replication factor | Number of copies of each block that will be stored (on different nodes) in the distributed file system. The default is 1. |
Block size (MB) | The minimum size of a file division. The default is 128 MB. |
Configure the Blob Output tool
Select the Blob Output tool.
Go to the Configuration tab.
Specify the File name of the file containing the blobs.
Specify the name of the Contents field. This field must be of type binary, textvar, or unicode.
Optionally, specify a Combining mode.
Optionally, go to the Execution tab and Enable trigger output, configure reporting options, or set Web service options.