Skip to main content
Skip table of contents




Address standardization

The process of taking an address and verifying that each component meets United States Postal Service guidelines for addresses. For example, "123 Main Avenue" should be abbreviated as "123 MAIN AVE" During standardization, minor misspellings, dropped components, and abbreviations are all corrected. The correct city, state, and ZIP Code are also provided.


A sequence of instructions that describe how to solve a particular problem.


The alignment of data within a field. Alignment can be left, right, center, or on a decimal point.


Consisting of letters or digits, or both, and sometimes including control characters, space characters, and other special characters.


Add records to the end of an existing data file or table.

Arithmetic operators

Symbols or other characters indicating operations that act on one or more elements. The +, -, *, /, and ( ) characters are operators used to construct arithmetic expressions.

Ascending order

The arrangement of a sequence of items from lowest to highest, such as from 1 to 10 or from A to Z. The rules for determining ascending order in a particular application can sometimes be very complicated (for example, capital letters before lowercase letters, or extended ASCII characters in ASCII order).

ASCII (American Standard Code for Information Interchange)

(Pronounced "ask-ee.")

Standard code, and sorting sequence, for representing characters as binary numbers used in microcomputers.

ASCII data

A document file in ASCII format, containing characters, spaces, punctuation, carriage returns, and sometimes tabs and an end-of-file marker, but no formatting information. ASCII data may be either delimited or fixed.


Aurora is a hosted MySQL and PostgreSQL compatible relational database service offered by Amazon.


Avro is a data serialization and data exchange framework. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.


Amazon Web Services (AWS) is an Amazon subsidiary that provides on-demand cloud computing platforms. AWS hosts numerous products and services, including the Aurora relational database, the Redshift data warehouse product, and the Simple Storage Service ( S3).


Number system that uses only the digits "0" and "1".

Binary data

Fixed record length data containing arbitrary bytes or words, as opposed to a text file containing only printable characters (for example, ASCII characters with codes 10, 13, and 32-126).


Smallest unit of binary data; can be "on" or "off" ("1" or "0").


A data type having only two possible values: True or False (Yes/No, 0/1).


BSON is a binary form developed by MongoDB for representing JSON-like documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON also contains extensions that allow representation of data types that are not part of the JSON spec.

Bulk mail

Second-class, third-class and fourth-class mail, serviced on a non-preferential basis by the United States Postal Service.

Byte order

Byte order or endianness refers to the convention used to interpret the bytes making up a data word when those bytes are stored in computer memory.

  • Big-endian systems store the most significant byte of a word in the smallest address and the least significant byte is stored in the largest address.

  • Little-endian systems store the least significant byte in the smallest address.

Byte-order mark

The byte-order mark (BOM) is a Unicode character pre-pended to a text stream to signal byte order and the presence of Unicode characters.

Carrier Route

A 4-byte code assigned to a United States Postal Service mail delivery or collection route within a 5-digit ZIP Code. The first character of this identification is alphabetical, and the last three are numeric.The alphabetical character has the following meanings:

  • B = PO box

  • H = Contract

  • R = Rural route

  • C = City delivery

  • G = General delivery


United States Postal Service (USPS) Coding Accuracy Support System. A service offered to mailers, service bureaus, and software vendors that improves the accuracy of DPV codes, ZIP+4 codes, 5-digit ZIP Codes, and carrier route information on mail. CASS Certified mailings qualify for substantial postage discounts.

Code pages

Code pages are tables of values that describe the character set for a particular language. Check out the table that lists the code pages supported by International Components for Unicode (ICU).

Column headings

The first row of a spreadsheet often contains column headings. In many applications, the user can identify this row and specify that the headings be used field names.


To join sequentially (for example, to combine the two strings "good" and "morning" into the single string "good morning").


A specific, unchanging value.

Data dictionary

A document, usually extracted from a computer file, which describes each field and its location within a database.

Data structure

An organizational scheme, such as a record or array, that can be applied to data to facilitate interpreting the data or performing operations on it. See also schema.

Data type

The kind of data a field can contain, for example: Text, Integer, Boolean, and Date.


An organized collection of information, stored as fields and records in tables and/or files.


To eliminate duplicate records within one or more files.

Delimited ASCII data

Variable length ASCII data in which fields are separated by a special character (usually a comma or tab). Field entries are often surrounded by double quotation marks (" "), and records separated by a carriage return-line feed.


A special character that separates individual items in a set of data. In the following example, commas separate the fields in a database record (each non-numeric field is enclosed by double quotation marks). "Armstrong", "123 Pine Street", "Toledo", "OH", 12345.


Statistics describing aspects of a population, such as age, sex, race, religion, income, and geographic location.

Descending order

A sort that arranges items in descending order—for example, with Z preceding A and higher numbers preceding lower ones.

DPV (Delivery Point Validation)

DPV is a United States Postal Service data product that checks whether a ZIP+4 coded address is a known and deliverable address record.

DPV false positive addresses

The United States Postal Service includes false positive addresses in their DPV directories as a security measure to prevent DPV abuse. If a Data Management address standardization process encounters one of these addresses, the process will fail with the warning:

False-positive DPV match at record 1.
This USPS requirement obstructs attempts to synthesize mailing lists.
Contact support to enable processing of this record.
False-positive Key = 123ABC456DEF789GHI012JKL345

To bypass this error, contact Data Management support and provide the False-positive Key value referenced in the error message. Data Management support will provide an override key, which you must install in the Data Management repository in a /SupportData folder.

eLOT (Extended Line of Travel)

eLot is a United States Postal Service data product used to sort mailings in approximate carrier-casing sequence. eLOT contains a sequence number field and an ascending/descending code. The sequence number indicates the first occurrence of delivery made to the add-on range within the carrier route, and the ascending/descending code indicates the approximate delivery order within the sequence number. eLOT processing may be used by mailers to qualify for enhanced carrier route presort discounts.

ETL (Extract, Transform, Load)

A process that extracts data from outside sources, transforms it to fit operational requirements, and loads it into the end target (usually a database).


The use of computer data to upgrade information contained in a customer or prospect list. Often referred to as "appending" data.

Field size

The maximum length of a data field.

Field type

The kind of data a field can contain, for example: text, fixed-point number, floating-point number, Boolean, and date.

Field value

The data contained in one field of a record. If no data is present, the field is considered blank.

Finance number

A code assigned to United States Postal Service (USPS) facilities to collect cost and statistical data and compile revenue and expense data. The state number comprises the first two positions of the finance number. The finance number can be used to match to records in other USPS files. By sorting these files by finance number, sequence matches can be made to use other street-level address information.

FIPS (Federal Information Processing Standards)

FIPS publicly announced standards developed by the United States federal government for use by all non-military government agencies and by government contractors.

FIPS state codes are numeric and two-letter alphabetic codes identifying U.S. states and certain other associated areas. The FIPS county code is a five-digit code uniquely identifying counties and county equivalents in the United States and possessions. The first two digits are the FIPS state code and the last three are the county code within the state or possession. County FIPS codes are usually in the same sequence as alphabetized county names within the state. They are usually odd numbers, so that new or changed county names can be easily accommodated.

Fixed ASCII file

An ASCII data file that has fixed field and record sizes, but no delimiters except possibly a record separator.

Fixed-length field

Data file format in which each field is allocated to a fixed number of bytes, regardless of its actual length.

Fixed-length record

Data file format in which each record is allocated to a fixed number of bytes, regardless of its actual length. All records are the same length, and there is neither a size prefix before records, nor a newline terminator after records.

Flat file

A single table of data stored in a plain text format.


Number of times a person orders within a specific time period. (See RFM.)


A number generated from a string of text. Sometimes called a message digest, hashes are frequently used to ensure the security of transmitted data or messages.


An information structure that appears at the beginning of a data file and identifies the information that follows.

HMAC (keyed-Hash Message Authentication Code)

This is a specific type of message authentication code that uses a cryptographic hash function in combination with a secret cryptographic key.


A Kerberos keytab (key table file) stores passwords and lets a user or service authenticate itself without user interaction.

LACS (Locatable Address Conversion Service) status indicator

Records that have been converted to the LACS system, a United States Postal Service product that allows mailers to identify and convert a rural route address to a city-style address.

  • L = LACS address: The old (usually rural-route) address that has been converted for the LACS system.

  • Blank = Not applicable.

Leading blanks or zeros

A zero that precedes the left-most digit of a number. One or more leading zeros may be used as fill characters in a numeric field.

Mainframe formatted sequential file

A mainframe formatted sequential file is a binary image of a mainframe file with variable length records. Learn more about mainframe formatted sequential files.


The arithmetic average or mean for a group of items is the sum of the values of the items divided by the number of items. It is frequently used as a measure of location for a frequency or probability distribution.


The median of a group of items is the value of the middle item when all the items are arranged in either ascending or descending order of magnitude. It is frequently used as a measure of location for a frequency or probability distribution.


Abbreviated MB. While a megabyte is technically 1,000,000 bytes, Data Management uses the mebibyte convention, with megabyte containing 1,048,576 bytes (220 or 1,024 x 1,024 bytes).


To combine two information files into one in a logical fashion (that is, according to certain sequencing requirements).


To combine two files into one in such a way that duplicates are recognized and eliminated.


A newline (line ending, end of line, or line break) is a special character or sequence of characters signifying the end of a line of text. The character codes representing a newline vary across operating systems.

Nth name selection

A fractional select unit that is repeated in sampling a mailing list. For example, "every 10th" would be a selection of records #1, #11, #21, and so on.


A symbol or other character indicating an operation to be performed on a value or values. For example, the + operator represents addition, and the * operator represents multiplication.


Apache Parquet is a columnar storage format used by many query engines for analytics workloads. Parquet features per-column compression and encoding schemes that offer significant performance benefits compared to a traditional row oriented format.

PMB (Private Mail Box)

Non-USPS. Distinct from PO Box, which United States Postal Service reserves for the boxes located at USPS post offices.


A collection of database fields, each with its own name and type.

Record layout

The organization of data fields within a record.

Record number

A unique number identifying each record in a database or table.


In COBOL, the REDEFINES clause defines alternate data entities for the same data location.


Amazon's hosted data warehouse product. While Data Management supports access to Redshift via JDBC, you should be aware of some data type limitations:

Redshift does not explicitly support binary data types. If you need to store binary data in a Redshift table, consider using the Calculate tool and the functions BinaryToHex (on write) and HexToBinary (on read) to encode binary data.

Redshift does not support a TIME data type.  If you map a Data Management TIME field to a Redshift column and allow Data Management to generate the table definition, Data Management will create a column of type TIMESTAMP, and insert a spurious date when populating the database.

Regular expression

A string of characters that defines a set of rules for matching character strings found in fields.

Rational database

A database or database management system (RDBMS) that stores information in tables—rows and columns of data—and conducts searches by using data in specified columns of one table to find additional data in another table. In a relational database, the rows of a table represent records (collections of information about separate items) and the columns represent fields (particular attributes of a record). This method requires less storage space than a sequential or "flat" database, but can be slow to access.

RFM (Recency, Frequency, Monetary)

A marketing assessment of the quality of a customer, used to evaluate sales potential.


Amazon Simple Storage Service (S3) is a hosted cloud data storage service accessed via web services interfaces. The Amazon S3 basic storage unit is an object. Objects are organized into buckets.


Formal description of the structure and organization of data within a database system.

Self-organizing feature map

A self-organizing map (SOM) is a self-organized projection of high-dimensional data onto a typically 2-dimensional (2-D) feature map, wherein vector similarity is implicitly translated into topological closeness in the 2-D projection. This produces a a regular grid that can be used to visualize and explore properties of the data.

SHA (Secure Hash Algorithm)

The Secure Hash Algorithm is a family of cryptographic hash functions published by the National Institute of Standards and Technology (NIST). Data Management functions use two Secure Hash Algorithms:

  • SHA-1: The most widely used of the existing SHA hash functions, SHA-1 is a 160-bit hash function which resembles the earlier MD5 algorithm. It was designed by the National Security Agency (NSA) to be part of the Digital Signature Algorithm. Cryptographic weaknesses were discovered in SHA-1, and the standard was no longer approved for most cryptographic uses after 2010.

  • SHA-2: A family of two similar hash functions, with different block sizes, known as SHA-256 and SHA-512. They differ in the word size; SHA-256 uses 32-bit words where SHA-512 uses 64-bit words. There are also truncated versions of each standardized, known as SHA-224 and SHA-384.


Arrange records in a file by alphabetical or numerical sequence.

SQL (Structured Query Language)

Structured Query Language (abbreviated SQL and commonly pronounced "sequel") is the standard database language used in querying, updating, and managing data in relational databases.


A data structure composed of a sequence of characters usually representing human-readable text.

Suppression file

A list of persons who have requested, usually through the Direct Marketing Association, that their name be removed from mailing lists.

Syntax error

An error resulting from an incorrectly expressed statement.


A data structure characterized by rows and columns, with data occupying each cell formed by a row-column intersection.

Trailing blanks or zeros

Fields in a data file may be larger in size than the value in that field so the value does not fill the entire field. There may be blanks or zeros occupying the bytes not occupied by the actual data.

Unix time

Unix time or Epoch time is a system for describing instants in time as the number of seconds before (negative values) or after (positive values) the baseline date/time of 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970.


An area, sector, or development within a city (Puerto Rico only).

URI (Universal Transverse Mecator)

Universal Transverse Mercator. A coordinate system that differs from global latitude/longitude in that it divides earth into 60 zones, with each zone separated by 6 degrees in longitude. Locations are expressed in terms of easting and northing (such as Easting 380749.6, Northing 4928286.8).

Variable-length field

A field that can vary in length according to how much data it contains. Fields are identified by sequence, rather than specific position.

Variable-length record

A record that can vary in length because it contains variable-length fields, a variable number of fields, or both. There is neither a size prefix before records, nor a newline terminator after records.

Virtual memory

Virtual memory is a memory management capability of an operating system (OS) which uses hardware and software to allow a computer to compensate for physical memory shortages, by temporarily transferring data from random access memory (RAM) to disk storage.

Windows time

Windows time is a system for describing instants in time as the number of 100-nanosecond ticks since the baseline date/time of 00:00:00 Coordinated Universal Time (UTC), 1 January 1601.

WSDL (Web Services Description Language)

An XML-based interface definition language for describing a web service in terms of the messages it sends and receive. A WSDL 2.0 service description indicates how potential clients are intended to interact with the described service.


X12 is a Electronic Data Interchange (EDI) standard developed by the Accredited Standards Committee (ASC) of the American National Standards Institute (ANSI). It is commonly used in the USA, while most of the rest of the world uses the EDIFACT (United Nations Electronic Data Interchange for Administration, Commerce and Transport)transaction sets.


United States Postal Service nine-digit code for a particular block, building, apartment, or business location. An average ZIP+4 area contain 10-15 households.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.