Skip to main content
Skip table of contents

Blob Input/Output

Blob Input

The Blob Input tool reads all files matching a path with a wildcard pattern, and transforms them into a series of fields of Binary type (blobs). The tool writes two output fields: FILENAME and CONTENTS. This is useful for transporting data contained in a set of files, such as sending it to a web service.

Blob Input tool configuration parameters

The Blob Input tool has a single set of configuration parameters in addition to the standard execution options.

Parameter

Description

Input file

A full or relative path with a wildcard-pattern indicating which files to read. It may contain wildcards in any position. Example: f:\data\*\dbf\*.*.

File splitting

Determines how the input data is split:

  • Don't split: complete files are read and sent to binary fields. A file that is larger than Max bytes per blob is skipped and a warning issued.

  • Byte size: files are split when they reach Max bytes per blob.

  • Line count: files are split at end-of-line markers (which are included in the data). The split point occurs when either Max bytes per blob or Line count is reached, whichever comes first.

Max bytes per blob

The largest allowable blob, in bytes. The minimum value is 4096. The maximum value corresponds to the maximum binary field size (currently 10MB).

Line count

If File splitting is Line count, the maximum number of lines that may be placed into a single blob.

Produce file name field

If selected, the file name will be output as a record field.

Output full path

If Produce file name field is selected, optionally outputs the entire path to the record file name field.

Output URI path

If Output full path is selected, express path as a Uniform Resource Identifier (URI).

Field name

If Produce file name field is selected, name of the column to be used for the file name. This is optional and defaults to FILENAME.

Field size

If Produce file name field is selected, size of the field to be used for the file name. This is optional and defaults to 255.

Configure the Blob Input tool

  1. Select the Blob Input tool.

  2. Go to the Configuration tab.

  3. Specify a full or relative path with a wildcard-pattern indicating which files to read.

  4. Optionally, specify File splitting:

    • Don't split: complete files are read and sent to binary fields. A file that is larger than Max bytes per blob is skipped and a warning issued.

    • Byte size: files are split when they reach Max bytes per blob.

    • Line count: files are split at end-of-line markers (which are included in the data). The split point occurs when either Max bytes per blob or Line count is reached, whichever comes first.

  5. If File splitting is set to Line count or Byte size, you may specify Max bytes per blob or accept the default setting.

  6. If File splitting is set to Line count, you may specify Max lines per blob.

  7. To include the name of the input file as a new field, you may select Produce file name field and specify a Field name and Field size. Select Output full path to include the complete file specification. This can be useful when reading a wildcarded set of files. Select Output URI path to express the complete file specification as a Uniform Resource Identifier.

  8. Optionally, go to the Execution tab and Enable trigger input, configure reporting options, or set Web service options.

The Blob Input tool produces Binary data. If the input data is text, you can use a Calculate tool downstream of the Blob Input tool and assign the function DecodeTextBytes(binary_field, "code_page_name") to a new text field

Blob Output

The Blob Output tool accepts an input with two fields, FILENAME and CONTENTS, and writes these to a series of output files.

Blob Output tool configuration parameters

The Blob Output tool has a single set of configuration parameters in addition to the standard execution options.

Parameter

Description

File name

The name of the input field containing a full or relative file path to which data will be written.

Contents

The name of the input field containing contents to be written to the file. The field must be of one of the following types:

  • binary: the contents of the binary field are written to the file.

  • textvar: the contents of the text field are written to the file.

  • unicode: the text is converted to UTF8 and written to the file. If another encoding is desired, use the function BinaryRecastFromText to create a binary field first.

Combining mode

If specified, how to create a single file from consecutive blobs that have the same value in the File name field. Options are:

  • Don't combine: do not combine blobs. Each blob will be written to its own file. Duplicate file names will result in a file being repeatedly overwritten. After execution, the file will contain the last blob.

  • Binary: (default) combine blobs, writing the combined data to each unique file name. Duplicate file names are combined into a single file by writing the blobs in order to the file.

  • Text: combine blobs of text, ensuring that each blob is separated by a newline if one is not present in the data. Duplicate file names are combined into a single file by writing the blobs in order to the file, inserting an end-of-line marker between blobs when necessary.

Replication factor

Number of copies of each block that will be stored (on different nodes) in the distributed file system. The default is 1.

Block size (MB)

The minimum size of a file division. The default is 128 MB.

Configure the Blob Output tool

  1. Select the Blob Output tool.

  2. Go to the Configuration tab.

  3. Specify the File name of the file containing the blobs.

  4. Specify the name of the Contents field. This field must be of type binary, textvar, or unicode.

  5. Optionally, specify a Combining mode.

  6. Optionally, go to the Execution tab and Enable trigger output, configure reporting options, or set Web service options.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.