Configure data types

Overview

Certain Data Management tools (such as Flat File Input, CSV Input, and Calculate) require that you define data types. You select the type from a list and specify details specific to the chosen field type as well as an optional Format string, which controls the formatting and scanning of data to and from text.

Format strings

Format strings are used to control the formatting and scanning of data to and from text. For formatting, they precisely specify the output format of a datum including any non-data formatting characters. In scanning, the format is interpreted more loosely, generally controlling the order in which components of data are scanned, and skipping over non-data characters.

The format strings are bound to the data type. This means that the record schemas themselves contain all of the information necessary to format data into textual output files (CSV, Flat File, and XML). Some functions for formatting data values also specify format strings directly for the conversion. For example, the ScanDateTime and StrictScanDateTime functions use different sets of symbols and patterns to define format strings for DateTime values.

You can use Data Management's Data type dialog to define different format strings for converting to and from text. Select Formats to view format strings available for the selected Data type.

Format is used when converting to text. Scan is used when converting from text. If Scan is undefined, Format is used instead.

Flat format is used for formatting data being that is read from or written to flat files.

Configure Flat format options

The Flat format options are used when values are being read from or written to flat files. The format options are bound to specific data types. This means that the record schemas themselves contain all of the information necessary to format data into flat output files.

Each data type interprets the flat format strings differently.

Data type

Flat format symbols

Decimal Int

Binary integer formatting:

B: Indicates binary format (B must be first letter of options)
S/U: Signed/unsigned (defaults to signed)
L/B: Little/big-endian (defaults to little)

Packed binary-coded decimal formatting:

P: Packed (P must be first letter of options)
S/U: Signed/unsigned (defaults to signed)

IBM's "zoned decimal" formatting:

Z: Zoned format (Z must be first letter of options)
S/U: Signed/unsigned (defaults to signed)

EBCDIC formatting:

E: Indicates EBCDIC text format.

Examples:

BUL: Binary, unsigned, little-endian
BB: Binary, signed, big-endian
E: EBCDIC textual format
P: Packed with sign nibble
ZU: Zoned without sign byte

Float

B: IEEE single precision (when flat size = 4).
B: IEEE single precision (when flat size = 8).

Binary

B: Written as raw binary data instead of Hex.

Configure Binary data types

This is a general data type for holding an array of bytes of variable length. In databases, these are often referred to as BLOB or VARBINARY column types.

Internally, Binary data is used by Data Management whenever an arbitrary collection of bytes is being moved around. For example, the Blob Input tool will read a series of files and convert them to a series of record with Binary fields. The Blob Output tool performs the opposite action. And the Web Service Call tool accepts Binary data for POST/PUT operations, and returns Binary data for POST/GET operations. Note that most Binary fields have a limit of 100MB. The Web Service Call tool has a limit of 50MB.

To configure a Binary data type, specify the Max size in bytes of the the field.

Configure Boolean data types

To configure a Boolean data type specify the Format of the text strings used as True and False values in your data, or accept the default format of F/T.

Format symbols:

False/True
false/true
F/T
f/t
0/1

Configure Date and Time data types

To configure a Date, Time, or DateTime data type change the Format string until the Example matches the format of your data, or accept the default format:

Time data types default to HH:mm:ss.
Date data types default to DD Mmm YYYY.
DateTime data types default to HH:mm:ss DD Mmm YYYY.

Note that Date and DateTime formats contain spaces by default. You may need to delete these spaces to match your data format.

Format string	Meaning
`M`	Numeric month (`7`)
`MM`	Numeric month with leading zero for single numbers (`07`)
`MMM`, `Mmm`, `mmm`	Abbreviated month name (`JUL`, `Jul`, `jul`)
`MMMM`, `Mmmm`, `mmmm`	Full month name (`JULY`, `July`, `july`)
`D`	Day of month (`7`, `24`)
`DD`	Day of month with leading zero for single numbers (`07`, `24`)
`WW`, `Ww`, `ww`	Abbreviated day of week (`FRI`, `Fri`, `fri`)
`WWW`, `Www`, `www`	Full day of week (`FRIDAY`, `Friday`, `friday`)
`YY`	Last two digits of the year (`98`)
`YYYY`	Four-digit year (`1998`)
`A`	`AM`/`PM`
`a`	`am`/`pm`
`h`	Hour of day (`1`-`12`)
`hh`	Hour of day with leading zero for single numbers (`01`-`12`)
`H`	Hour of day (`0`-`23`)
`HH`	Hour of day with leading zero for single numbers (`00`-`23`)
`m`	Minutes (`0`-`59`)
`mm`	Minutes with leading zero for single numbers (`00`-`59`)
`s`	Seconds (`0`-`59`)
`ss`	Seconds with leading zero (`00`-`59`)
`f`, `ff`, `fff`	Fractional seconds to `1`, `2`, or `3` decimal places
`z`	Time zone indicator (`-7`, `+8`)
`zz`	Time zone indicator (`-0700`)
`zzz`	Time zone indicator (`-7:00`)

The ScanDate and ScanDateTime functions accommodate minor format deviations when parsing input data, ignoring extra spaces and differences in delimiters and formats. This can sometimes result in bad data being parsed into invalid values. To ensure date/time format conformance, you can validate incoming Date, Time, and DateTime values using regular expressions.

The function StrictScanDateTime uses a different set of symbols to represent DateTime formats. If you are defining a format string for this function, use those Strict DateTime format patterns instead.

Configure Decimal data types

The Decimal data type is typically used to express numeric values with fixed points, such as currency.

To configure a Decimal data type, specify the Decimal precision of the data:

Size is the total number of digits in the field.
Scale is the number of digits to the right of the decimal place.

Specify Format using the symbols shown in the table below, or accept the default format of #.%.

Format symbol	Meaning
`#`	All digits left of decimal
`%`	All digits right of decimal
`#%`	Format without decimal
`#.%`	Format with decimal
`N#`,`N%`	Show N digits here (pad with space)
`0N#`,`0N%`	Show N digits here (pad with 0)
`+`	Show leading `+` sign if positive
`#,#`	Format with thousands separated by commas

Examples:

Input	Format	Result
`1234567890`	`(3#)3#-4#`	`(123)456-7890`
`1234.56`	`05#.3%`	`01234.560`
`1234.56`	`07#.3%`	`1234.560` (3 leading spaces)
`123.4`	`+05#2%`	`+0012340`
`1234567.89`	`#,#.2%`	`1,234,567.89`

Configure Document data types

Data Management supports a Document data type, which is modeled after BSON. You can select Document as the type in the same way you configure data types in tools like such as Calculate, CSV Input, and Change Field Types. See Document data type for more information.

Configure Floating point number data types

To configure a Floating point data type, specify the Floating point precision of the data:

Regular is a floating point number stored in four bytes. The range of values that can be stored in a single-precision Float is -3.402823466e+38 to 3.402823466e+38.The smallest allowable single-precision value is 1.175494351e--38F.
Double is a floating point number stored in eight bytes, twice the size required for storing a less precise (single-precision) number. The range of values that can be stored in a double-precision Float is-1.7976931348623158e+308 to 1.7976931348623158e+308. The smallest allowable double-precision value is 2.2250738585072014e--308.

Optionally, specify Format.

Enter E to enable exponential notation. Exponential notation is a floating-point numeric format expressed in two parts: a mantissa and an exponent. The mantissa specifies the digits in the number, and the exponent specifies the magnitude of the number (that is, the position of the decimal point). For example, the numbers 345600000 and 0.0000987 will be formatted as 3456E5 and 987E-7.
Enter F to enable floating-decimal notation, which will omit trailing zeros and the trailing decimal point if not required. For example, the numbers 123.456000 and 123.0000 will be formatted as 123.456 and 123.

Scanned float literals are double-precision (at least until you store them in a single-precision field).

Configure Integer data types

To configure an Integer data type, specify the Integer scale of the data:

Tiny (1 byte)
Short (2 byte)
Regular (4 byte)
Long (8 byte)

Select or clear Signed to match the format of your data.

Size	Signed	Minimum	Maximum
Tiny	Y	`-128`	`128`
Tiny	N	`0`	`255`
Short	Y	`-32768`	`32767`
Short	N	`0`	`65535`
Regular	Y	`-2147483648`	`2147483647`
Regular	N	`0`	`4294967295`
Long	Y	`-9223372036854775808`	`09223372036854775808`
Long	N	`0`	`18446744073709551615`

Optionally, specify Format using the symbols shown in the table below. If no format is specified, defaults to #.

Format symbol	Meaning
`#`	All digits left of decimal
`N#`	Show N digits here (pad with space)
`0N#`	Show N digits here (pad with 0)
`+`	Show leading + sign if positive
`#,#`	Format with thousands separated by commas

Examples:

Input	Format	Result
`1234567890`	`(3#)3#-4#`	`(123)456-7890`
`1234`	`6#`	`1234` (2 leading spaces)
`123`	`+05#`	`+00123`
`-123`	`+05#`	`-00123`
`1234567`	`#,#`	`1,234,567`

Configure Text data types

Text data types can contain ASCII or EBCDIC. The default is ASCII. For EBSDIC, specify EBCDIC as the Format.

Most text fields have a limit of 100MB.

To configure a Text data type, specify the Maximum # of characters of the field, and whether it is fixed-length or variable-length:

Text variable-length varies in length according to how much data the field contains, up to Max # characters.
Text fixed-length is always the length specified in Max # characters.

Configure Spatial data type

The Spatial data type is used by a variant field that holds one spatial object. A spatial object may be one of:

Point
MultiPoint
Polygon
MultiPolygon
Line
PolyLine

Only specialized tools can make use of spatial data.

Configure Unicode text data types

Unicode data types are variable-length UTF-16 encoded strings. This data type is capable of holding textual data from almost any language. In databases, his data type is often referred to as nvarchar, although in some databases all text is Unicode. Note that most text fields have a limit of 100MB.

To configure a Unicode data type, specify the Maximum # of characters.