Blob Input/Output

Blob Input

The Blob Input tool reads all files matching a path with a wildcard pattern, and transforms them into a series of fields of Binary type (blobs). The tool writes two output fields: FILENAME and CONTENTS. This is useful for transporting data contained in a set of files, such as sending it to a web service.

Blob Input tool configuration parameters

The Blob Input tool has a single set of configuration parameters in addition to the standard execution options.

Parameter	Description
Input file	A full or relative path with a wildcard-pattern indicating which files to read. It may contain wildcards in any position. Example: `f:\data\\dbf\.*.`
File splitting	Determines how the input data is split: Don't split: complete files are read and sent to binary fields. A file that is larger than Max bytes per blob is skipped and a warning issued. Byte size: files are split when they reach Max bytes per blob. Line count: files are split at end-of-line markers (which are included in the data). The split point occurs when either Max bytes per blob or Line count is reached, whichever comes first.
Max bytes per blob	The largest allowable blob, in bytes. The minimum value is 4096. The maximum value corresponds to the maximum binary field size (currently 10MB).
Line count	If File splitting is Line count, the maximum number of lines that may be placed into a single blob.
Produce file name field	If selected, the file name will be output as a record field.
Output full path	If Produce file name field is selected, optionally outputs the entire path to the record file name field.
Output URI path	If Output full path is selected, express path as a Uniform Resource Identifier (URI).
Field name	If Produce file name field is selected, name of the column to be used for the file name. This is optional and defaults to `FILENAME`.
Field size	If Produce file name field is selected, size of the field to be used for the file name. This is optional and defaults to `255`.

Configure the Blob Input tool

Select the Blob Input tool.
Go to the Configuration tab.
Specify a full or relative path with a wildcard-pattern indicating which files to read.
Optionally, specify File splitting:
- Don't split: complete files are read and sent to binary fields. A file that is larger than Max bytes per blob is skipped and a warning issued.
- Byte size: files are split when they reach Max bytes per blob.
- Line count: files are split at end-of-line markers (which are included in the data). The split point occurs when either Max bytes per blob or Line count is reached, whichever comes first.
If File splitting is set to Line count or Byte size, you may specify Max bytes per blob or accept the default setting.
If File splitting is set to Line count, you may specify Max lines per blob.
To include the name of the input file as a new field, you may select Produce file name field and specify a Field name and Field size. Select Output full path to include the complete file specification. This can be useful when reading a wildcarded set of files. Select Output URI path to express the complete file specification as a Uniform Resource Identifier.
Optionally, go to the Execution tab and Enable trigger input, configure reporting options, or set Web service options.

The Blob Input tool produces Binary data. If the input data is text, you can use a Calculate tool downstream of the Blob Input tool and assign the function DecodeTextBytes(binary_field, "code_page_name") to a new text field

Blob Output

The Blob Output tool accepts an input with two fields, FILENAME and CONTENTS, and writes these to a series of output files.

Blob Output tool configuration parameters

The Blob Output tool has a single set of configuration parameters in addition to the standard execution options.

Parameter	Description
File name	The name of the input field containing a full or relative file path to which data will be written.
Contents	The name of the input field containing contents to be written to the file. The field must be of one of the following types: binary: the contents of the binary field are written to the file. textvar: the contents of the text field are written to the file. unicode: the text is converted to UTF8 and written to the file. If another encoding is desired, use the function BinaryRecastFromText to create a binary field first.
Combining mode	If specified, how to create a single file from consecutive blobs that have the same value in the File name field. Options are: Don't combine: do not combine blobs. Each blob will be written to its own file. Duplicate file names will result in a file being repeatedly overwritten. After execution, the file will contain the last blob. Binary: (default) combine blobs, writing the combined data to each unique file name. Duplicate file names are combined into a single file by writing the blobs in order to the file. Text: combine blobs of text, ensuring that each blob is separated by a newline if one is not present in the data. Duplicate file names are combined into a single file by writing the blobs in order to the file, inserting an end-of-line marker between blobs when necessary.
Replication factor	Number of copies of each block that will be stored (on different nodes) in the distributed file system. The default is 1.
Block size (MB)	The minimum size of a file division. The default is 128 MB.

Configure the Blob Output tool

Select the Blob Output tool.
Go to the Configuration tab.
Specify the File name of the file containing the blobs.
Specify the name of the Contents field. This field must be of type binary, textvar, or unicode.
Optionally, specify a Combining mode.
Optionally, go to the Execution tab and Enable trigger output, configure reporting options, or set Web service options.