Skip to main content
Skip table of contents

Pair-to-Group

Pair-to-Group is a highly-specific tool designed to facilitate merge/purge and similar processes. It accepts a single input containing pairs of values that are considered "equal" in some sense, and generates a single output in which each distinct value is assigned to a group. The group identifier is chosen from one of the values in the group. The tool is typically used to:

  • Aggregate the results of multiple comparison passes in a merge/purge or householding project.

  • Process complex grouping operations. For example, you could conduct "What if" experiments to study the effect of grouping departments into logical business units. The input to the tool would be pairs of departments to group together, and the result would be departments assigned to a new representative "department group."

The Pair-to-Group tool has a central role in merge/purge processes. To understand why this is so, consider the "record comparison" phase of a merge/purge process. Typically, the entire set of records is sorted multiple ways, and then the Window Compare tool marches down the records and compares records that are close to each other. When two records match, a pair of identifying values (one for each record) is written to the output. This generated table of pairs becomes the input to the Pair-to-Group tool.

The task performed by Pair-to-Group is similar to the task of figuring out who all your friends are, no matter how distantly connected. If you first consider all of your friends, then your friends' friends, and so on indefinitely, you will eventually arrive at a group of people that are all distantly-connected friends.

Pair-to-Group does the same thing with records: if record A is equal to record B, and B is equal to C, then A, B, and C all belong to the same group. In merge/purge, all records belonging to the same group are considered duplicates of each other, and should be reconciled into a single record. For example, if the input is:

ID_1

ID_2

A

B

C

D

E

F

D

A

A, B, C, and D all are assigned to group A, because A=B, C=D, and D=A. Records E and F are assigned to their own group because they are not equivalent to anything else. Thus, the Pair-to-Group output will be:

ID_KEY

ID_GROUP

A

A

B

A

C

A

D

A

E

E

F

E

 Pair to Group tool configuration parameters

The Pair to Group tool has a single set of configuration parameters in addition to the standard execution options:

Left field

Input field containing the left value of the pair.

Right field

input field containing the right value of the pair.

Use smallest identifier

If selected, the first (smallest) value within a group becomes the "master" record in merge/purge processes.

Order output

Order of the output data. This is optional and defaults to By group.

Configure the Pair-to-Group tool

  1. Select the Pair-to-Group tool, and then go to the Configuration tab on the Properties pane.

  2. In the Left field list, select the first field that defines the pair.

  3. In the Right field list, select the second field that defines the pair.

  4. Optionally, clear the Use smallest identifier box to select the largest ID as the "master" record in merge/purge processes.

Note that output field names are generated automatically. If the Left and Right input field names have a common prefix, then the output field names will take the form "CommonPrefix_KEY" and "CommonPrefix_GROUP."

  1. Specify an Order output, defining the order of the output data. The tool runs fastest when output is ordered By key.

  2. Optionally, go to the Execution tab, and then set Web service options.

image-20240329-155400.png

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.