Working with the tools

Overview

To take full advantage of Data Management's powerful features, you should become familiar with the way Data Management works.

Data Management tools operate on entire tables

Entire tables originate with the input tools, flow through the transformation tools, and into output tools.

Links between tools channel a table of data from a tool output to a tool input.
Every Data Management tool operates on one or more complete tables.

Branch as many ways as you like

You can connect any tool output to the inputs of multiple tools. This branches the data-flow.

Branching sends the same table through multiple "downstream" processing paths.
Branching lets you send the same table to different outputs, process the same table in different ways, or attach a Data Viewer tool for troubleshooting. The sample project name_frequency.dlp is an example of this technique.
Branching usually (but not always) creates temporary files to hold a copy of the table until all branches have a chance to process it.
Use the Data View tool often. It is your best troubleshooting tool.

Splitting and recombining

Splitting data creates two copies of it, which can be processed independently. You can split the output of any tool by connecting more than one downstream tool to it.

You can solve some problems by splitting the data, processing the two groups separately, and then recombining them. For example, the sample project sql_equivalent4.dlp splits the data, summarizes the left side of the split, and then joins the left and right sides together.
Think about splitting a set of records into two groups based on some criteria. You can perform different operations on different subsets of records and then recombine them into a single table. This is especially useful when some records need special treatment but others do not. The basic technique is:
1. Filter, Join, or Unique tools to split the records into two groups. (Cascade multiple tools to create more than two groups).
2. Process the tables for each subset separately.
3. Recombine them using Merge.

The sample project deduplicate.dlp uses this technique.

Filter, Join, and Unique tools split sets of records in different ways

Filter: splits records based on a filter expression that you write. For example, your records might have a NAME field. The expression IsNull(NAME) would split records based on whether the NAME field is blank.
Join: splits records based on whether or not they match records in another table. For example, suppose you have a table that should link to another table on some key. But your data is messy, and the relationship doesn't always hold. You can join the two tables, process the inner and outer join outputs separately, and then recombine them. The sample project cross_assign.dlp demonstrates this technique.
Unique: splits records based on whether they are unique (according to a set of keys you select) or not. For each set of key values encountered, the first record (the unique one) is sent to the "U" output, and all of the non-unique (duplicate) records are sent to the "D" output. Using unique, you can remove duplicate records from a table, keeping the unique records while aggregating some information from the duplicates back to the uniques. The sample project aggregate_duplicates.dlp demonstrates this technique.