Pair-to-Group
Overview
Pair-to-Group is a highly-specific tool designed to facilitate merge/purge and similar processes. It accepts a single input containing pairs of values that are considered "equal" in some sense, and generates a single output in which each distinct value is assigned to a group. The group identifier is chosen from one of the values in the group. The tool is typically used to:
Aggregate the results of multiple comparison passes in a merge/purge or householding project.
Process complex grouping operations. For example, you could conduct "What if" experiments to study the effect of grouping departments into logical business units. The input to the tool would be pairs of departments to group together, and the result would be departments assigned to a new representative "department group."
The Pair-to-Group tool has a central role in merge/purge processes. To understand why this is so, consider the "record comparison" phase of a merge/purge process. Typically, the entire set of records is sorted multiple ways, and then the Window Compare tool marches down the records and compares records that are close to each other. When two records match, a pair of identifying values (one for each record) is written to the output. This generated table of pairs becomes the input to the Pair-to-Group tool.
The task performed by Pair-to-Group is similar to the task of figuring out who all your friends are, no matter how distantly connected. If you first consider all of your friends, then your friends' friends, and so on indefinitely, you will eventually arrive at a group of people that are all distantly-connected friends.
Pair-to-Group does the same thing with records: if record A is equal to record B, and B is equal to C, then A, B, and C all belong to the same group. In merge/purge, all records belonging to the same group are considered duplicates of each other, and should be reconciled into a single record.
For example, if the input is:
ID_1 | ID_2 |
---|---|
A | B |
C | D |
E | F |
D | A |
A, B, C, and D all are assigned to group A, because A=B, C=D, and D=A. Records E and F are assigned to their own group because they are not equivalent to anything else.
Thus, the Pair-to-Group output will be:
ID_KEY | ID_GROUP |
---|---|
A | A |
B | A |
C | A |
D | A |
E | E |
F | E |
Pair to Group tool configuration parameters
The Pair to Group tool has a single set of configuration parameters in addition to the standard execution options.
Parameter | Description |
---|---|
Left field | Input field containing the left value of the pair. |
Right field | input field containing the right value of the pair. |
Use smallest identifier | If selected, the first (smallest) value within a group becomes the "master" record in merge/purge processes. |
Order output | Order of the output data. This is optional and defaults to By group. |
Configure the Pair-to-Group tool
Select the Pair-to-Group tool.
Go to the Configuration tab on the Properties pane.
In the Left field list, select the first field that defines the pair.
In the Right field list, select the second field that defines the pair.
Optionally, clear the Use smallest identifier box to select the largest ID as the "master" record in merge/purge processes.
Output field names are generated automatically. If the Left and Right input field names have a common prefix, then the output field names will take the form CommonPrefix_KEY
and CommonPrefix_GROUP
.
Specify an Order output, defining the order of the output data. The tool runs fastest when output is ordered By key.
Optionally, go to the Execution tab, and then set Web service options.