Analyzing your data

Overview

The projects in the repository folder Samples\Basic\Analyzing your data demonstrate techniques for:

Finding the top ten

The sample project best sellers finds the ten best-selling books in the PUBS database, according to dollar volume rather than the more common "units" measure. The general strategy is:

Calculate the field of interest—in this example, the year-to-date dollar sales for each title. This is calculated by multiplying price by year-to-date sales for each title.
Sort the resulting field (year-to-date dollar sales) descending.
Choose the first ten records.

Finding the maximum or minimum record

This sample project min max demonstrates finding, for each unique value of one field, the record containing the maximum and minimum values for another field. The project finds, for each SALESREP, the record containing the maximum and minimum values for the field AMOUNT.

This is different than finding the minimum and maximum AMOUNT values. That could be done much more easily using the Summarize tool. This finds the entire record that contains the min or max value, something a bit more complex.

The general strategy is:

Sort on the ID field in ascending order, with a secondary sort on the AMOUNT field.
To find the max record, sort descending. To find the min record, sort ascending.
Use the Unique tool to remove duplicate records, specifying the ID field as the group field. This identifies the maximum (or minimum) record for each ID field.

The result? The records containing the largest and smallest sales (in dollar amounts) are displayed for each of five sales representatives.

Obtaining frequency counts

The sample project name frequency produces a frequency count of the incidence of first names and surnames in a name/address file. The general strategy is:

Select the fields of interest.
Count each field of interest in a Summarize tool.
Sort descending on the Count field.

Getting a unique list of values

The sample project unique2 produces a unique list of ZIP Codes found in a mailing list. This technique is useful for basic data-quality assessment. Obtaining a Unique list can give you information about "coverage": Which area codes does my data cover? Which ZIP codes? Which cities?

Assigning custom rankings

Custom ranking levels are useful when you want to map a numeric value (such as a payment amount) onto a finite set of values. For example, suppose you are running a non-profit organization, and wish to rank your donors into four levels:

A = 1–99
B = 100 – 999
C = 1000 – 9999
D = 10000+

The project custom rank uses the Calculate tool and an IF/THEN/ELSE expression to assign the appropriate RANK based on the value of the DONATION field.