Skip to main content
Skip table of contents

Regex

Overview

The Regular Expression (Regex) tool can split and match text strings using regular expressions. It is useful for finding and extracting portions strings that match (or don't match) a regular expression or a sub-group in a regular expression. Some common uses of the Regex tool are.

Finding numbers

Input

Regex

Output1

Output2

Output3

123 Main St #45

\d+

123

45

Acme Software:(303)541-1515

\d+

303

541

1516

Matching patterns

Input

Regex

Output1

Output2

Acme Software:(303)541-1515

\(\d{3}\)\d{3}-\d{4}

(303)541-1515

111-22-3333/444-55-6666

\d{3}-\d{2}-\d{4}

111-22-3333

444-55-6666

Splitting words

Input

Regex

Out1

Out2

Out3

Out4

Out5

Out6

Out7

Out8

Now, is the... time for all good men?

\w+

Now

is

the

time

for

all

good

men

The Regex tool operates in several different modes. Each has its own uses, depending on what you want to achieve.

Splitting

Use splitting when you have a string, and you want to split out substrings that match the regex.  Splitting can produce either "wide" output (many columns in one record) or "tall" output (one column and many records).  For example, a "wide" split on the regex \d+ can produce the following.

Input

Output1

Output2

Output3

(303)541-1515

303

541

1516

Whereas a "tall" split would produce the following.

Input

Output

(303)541-1515

303

(303)541-1515

541

(303)541-1515

1516

Splitting can also be configured to produce the unmatched parts of the string.

Input

Output1

Output2

Output3

(303)541-1515

(

)

-

Or both matched and unmatched.

Input

Out1

Out2

Out3

Out4

Out5

Out6

(303)541-1515

(

303

)

541

-

1516

Extracting

Use extracting when you have a string, and you want to extract portions of the string that match sub-parts of the regex—what are known as the "capturing groups" of the regex. Data Management treats the entire regex as the first "capturing group". Subsequent capturing groups are those portions of the regex enclosed in parentheses. For example, in the regex: (ab(cd)(ef)) there are four groups. It is possible to write regexes where some groups do not capture; see the external references for more details.

Regex tool configuration parameters

The Regex tool has one set of configuration parameters in addition to the standard execution options.

Parameter

Description

Input field

The input field to process.

Regular expression

The regular expression to compare to Input field.

Case insensitive

If selected, perform case-insensitive matching.

Operation

The operation to perform upon the input field. This is optional and defaults to Split (wide output).

Output field

If Operation is Split (wide output), the base name of the output field. This is optional and defaults to OUTPUT.

Number of outputs

If Operation is Split (wide output), the number of output fields to generate. This is optional and defaults to 10.

Capture

If Operation is Split (tall output) or Extract (repeating), determines which data to output. This is optional and defaults to Matched.

Include empty matches when splitting

If selected, includes empty strings in output when Operation is Split (wide output) or Split (tall output).

Generate ID

If selected, sequentially numbers the output records according to the input record they came from when Operation is Split (tall output) or Extract (repeating).

ID field

If Generate ID is selected, the ID field. This is optional and defaults to ID.

Generate sequence

If selected, generates a sequence number containing the position of the output record within its group when Operation is Split (tall output) or Extract (repeating).

Sequence field

If Generate sequence is selected, the sequence field. This is optional and defaults to SEQUENCE.

Configure the Regex tool

  1. Select the Regex tool.

  2. Go to the Configuration tab on the Properties pane.

  3. Choose an Input field.

  4. Enter a regular expression in the Regular Expression box.

  5. Optionally, select Case insensitive to ignore case when matching.

  6. Choose the Operation:

    • Split (wide output)

    • Split (tall output)

    • Extract (first)

    • Extract (repeating)

  7. Optionally, specify an output field name other than the default OUTPUT.

  8. If Operation is Split (wide output), specify the Number of outputs. This will be the size of your "wide" output. Data Management will generate up to 10 sequentially numbered fields based on your output field name.

  9. Optionally, select Include empty matches when splitting. Normally, empty strings are omitted from the output. This option is rarely needed.

  10. For the Split (tall output) and Extract (repeating) operations, you can optionally generate IDs to help match the multiple output records to each input record:

    • Select Generate ID and specify a field name to output the input record number, starting at one.

    • Select Generate Sequence and supply a field name to number records within each ID, starting at one.

  11. Optionally, go to the Execution tab, and then set Web service options.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.