Tips to optimize performance
General performance tips
License multiple CPU cores.
Configure tool and sort parallelism, and memory use.
Use the AO macros for matching and parsing; they are optimized for performance.
If you know data from an external file is already sorted, use a Validate Order tool to help Data Management optimize.
Don't use a Regex function when a simple Find or Replace function will do.
The Substitute function is faster then ReplaceAllText.
Use the Calculate tool's local variables to avoid inserting temporary results into the record stream.
Smaller fields are faster than larger ones.
Store data that must be re-read as DLD files. These are much faster than any other format.
Databases typically load larger blocks faster than smaller ones.
You can often increase RDBMS load speeds by running two RDBMS Output tools in parallel downstream from a Splitter tool.
Database tables without indexes load faster.
Tips for speeding up processing
Hardware
Data Management uses a lot of temporary disk space. See Selecting temporary disk space for tips on setting up your temp space.
Replace your rotating temp space disks with Solid State Drives (SSDs).
More memory is better. While 256MB project memory might be adequate for processing moderate amounts of data without CASS or Geocoding, you should use 4GB or more depending on data size and processing needs. 64-bit Data Management can make good use of more then 16GB of memory.
When processing huge amounts of data, use multiple independent temp spaces, each configured with independent disks.
Projects
If Data Management runs on a "busy" machine (for example, one also running SQL Server), you should configure the amount of memory used to be smaller than the default.
Data Management can extract and process data many times faster than your database server. Let your database server do the basic extraction to minimize the number of records pulled from the database, but let Data Management perform any complex operations. You'll save time.
If you will be reading the same database many times (such as when developing a Data Management project), build and run a simple Data Management project to extract the database to a file (DLD is a good format). Then configure your main project to read from the new file.
Select only the fields you need for your project. Extra data slows processing and uses more temp disk space! If you are processing very large records, but the project only operates on a small number of fields, you can tag the input records with a unique ID, select the "core" fields to operate on, then rejoin the results of the operation with the original data.
Watch out for exploding Joins!