Pattern Assembler
Overview
The Pattern Assembler tool is the final step in textual parsing. Once patterns have been matched in the Pattern Match tool, you may want to reassemble the records in different ways to convert the unstructured data into a structured form. The Pattern Assembler tool allows you to specify multiple tokens within a record group, to be combined into one or more fields. Or, you can distribute tokens to wide record fields (multiple similar fields within a single record).
Pattern Assembler tool parameters
The Pattern Assembler tool has one set of configuration parameters in addition to the standard execution options.
Parameter | Description |
---|---|
Input group | Field that uniquely identifies each original record, as specified in the upstream Token Creation tool. |
Input token | Field containing the tokens output by the Token Creation tool. |
Input class | Field containing the classes output by the Pattern Match tool. |
Output all available fields | If selected, includes all input fields in output. |
Specify token reassembly | Name of class. If no Input class is specified, this must be blank. |
Specify token reassembly | If defined, text to add to the output field before the first token of this Class. |
Specify token reassembly | If selected, always add Prefix. By default, Prefix is only added to non-empty fields. |
Specify token reassembly | Text to insert between tokens of this Class. |
Specify token reassembly | If Output is new, specifies the size. |
Specify token reassembly | Base name of the Output field to receive the assembled tokens. If more than one output field is specified, tokens of the class for this row are distributed sequentially to all the named output fields. This is optional and defaults to Output1 through OutputN. |
Configure the Pattern Assembler tool
Select the Pattern Assembler tool.
Go to the Configuration tab on the Properties pane.
Select Input group and choose the unique ID field you specified in the Pattern Match tool.
Select Input token and choose the field containing the tokens output by the Token Creation tool.
Select Input class and choose the field containing the Pattern Match tool's Output Class.
Optionally, select Output all fields to send all input fields to output.
Using the Specify token reassembly grid, create a sequence of "assembly instructions." For each row, specify some or all of the following items:
Class: the token class to process in this row.
Prefix: a string to add to the output field before the first token of this class. This is typically used to separate sections of the output field.
Always Prefix: check this column if the Prefix should always be added. By default, the Prefix is only added when the field already contains something.
Separator: a string to place between contiguous tokens of the same class.
Output size: if you are creating a new output field, this specifies the size. Since the same output field can be specified many times, the first Output size specified is used.
Output1: the output field to create.
Output2 through N: additional output fields. If more than one output field is specified, tokens of the class for this row are distributed sequentially to all the named output fields. This is useful when separating the contents of a field into multiple homogeneous fields (for example, one field per month, or one field per household member).
Optionally, go to the Execution tab, and then set Web service options.