Configure data types
Overview
Certain Data Management tools (such as Flat File Input, CSV Input, and Calculate) require that you define data types. You select the type from a list and specify details specific to the chosen field type as well as an optional Format string, which controls the formatting and scanning of data to and from text.
Format strings
Format strings are used to control the formatting and scanning of data to and from text. For formatting, they precisely specify the output format of a datum including any non-data formatting characters. In scanning, the format is interpreted more loosely, generally controlling the order in which components of data are scanned, and skipping over non-data characters.
The format strings are bound to the data type. This means that the record schemas themselves contain all of the information necessary to format data into textual output files (CSV, Flat File, and XML). Some functions for formatting data values also specify format strings directly for the conversion. For example, the ScanDateTime and StrictScanDateTime functions use different sets of symbols and patterns to define format strings for DateTime values.
You can use Data Management's Data type dialog to define different format strings for converting to and from text. Select Formats to view format strings available for the selected Data type.
Format is used when converting to text. Scan is used when converting from text. If Scan is undefined, Format is used instead.
Flat format is used for formatting data being that is read from or written to flat files.
Configure Flat format options
The Flat format options are used when values are being read from or written to flat files. The format options are bound to specific data types. This means that the record schemas themselves contain all of the information necessary to format data into flat output files.
Each data type interprets the flat format strings differently.
Data type | Flat format symbols |
---|---|
Decimal Int | Binary integer formatting:
Packed binary-coded decimal formatting:
IBM's "zoned decimal" formatting:
EBCDIC formatting:
Examples:
|
Float |
|
Binary |
|
Configure Binary data types
This is a general data type for holding an array of bytes of variable length. In databases, these are often referred to as BLOB
or VARBINARY
column types.
Internally, Binary data is used by Data Management whenever an arbitrary collection of bytes is being moved around. For example, the Blob Input tool will read a series of files and convert them to a series of record with Binary fields. The Blob Output tool performs the opposite action. And the Web Service Call tool accepts Binary data for POST
/PUT
operations, and returns Binary data for POST
/GET
operations. Note that most Binary fields have a limit of 100MB. The Web Service Call tool has a limit of 50MB.
To configure a Binary data type, specify the Max size in bytes of the the field.
Configure Boolean data types
To configure a Boolean data type specify the Format of the text strings used as True
and False
values in your data, or accept the default format of F
/T
.
Format symbols:
False/True
false/true
F/T
f/t
0/1
Configure Date and Time data types
To configure a Date, Time, or DateTime data type change the Format string until the Example matches the format of your data, or accept the default format:
Time data types default to
HH:mm:ss
.Date data types default to
DD Mmm YYYY
.DateTime data types default to
HH:mm:ss DD Mmm YYYY
.
Note that Date and DateTime formats contain spaces by default. You may need to delete these spaces to match your data format.
Format string | Meaning |
---|---|
| Numeric month ( |
| Numeric month with leading zero for single numbers ( |
| Abbreviated month name ( |
| Full month name ( |
| Day of month ( |
| Day of month with leading zero for single numbers ( |
| Abbreviated day of week ( |
| Full day of week ( |
| Last two digits of the year ( |
| Four-digit year ( |
|
|
|
|
| Hour of day ( |
| Hour of day with leading zero for single numbers ( |
| Hour of day ( |
| Hour of day with leading zero for single numbers ( |
| Minutes ( |
| Minutes with leading zero for single numbers ( |
| Seconds ( |
| Seconds with leading zero ( |
| Fractional seconds to |
| Time zone indicator ( |
| Time zone indicator ( |
| Time zone indicator ( |
The ScanDate and ScanDateTime functions accommodate minor format deviations when parsing input data, ignoring extra spaces and differences in delimiters and formats. This can sometimes result in bad data being parsed into invalid values. To ensure date/time format conformance, you can validate incoming Date, Time, and DateTime values using regular expressions.
The function StrictScanDateTime uses a different set of symbols to represent DateTime formats. If you are defining a format string for this function, use those Strict DateTime format patterns instead.
Configure Decimal data types
The Decimal data type is typically used to express numeric values with fixed points, such as currency.
To configure a Decimal data type, specify the Decimal precision of the data:
Size is the total number of digits in the field.
Scale is the number of digits to the right of the decimal place.
Specify Format using the symbols shown in the table below, or accept the default format of #.%
.
Format symbol | Meaning |
---|---|
| All digits left of decimal |
| All digits right of decimal |
| Format without decimal |
| Format with decimal |
| Show N digits here (pad with space) |
| Show N digits here (pad with 0) |
| Show leading |
| Format with thousands separated by commas |
Examples:
Input | Format | Result |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Configure Document data types
Data Management supports a Document data type, which is modeled after BSON. You can select Document as the type in the same way you configure data types in tools like such as Calculate, CSV Input, and Change Field Types. See Document data type for more information.
Configure Floating point number data types
To configure a Floating point data type, specify the Floating point precision of the data:
Regular is a floating point number stored in four bytes. The range of values that can be stored in a single-precision Float is
-3.402823466e+38
to3.402823466e+38
.The smallest allowable single-precision value is1.175494351e--38F
.Double is a floating point number stored in eight bytes, twice the size required for storing a less precise (single-precision) number. The range of values that can be stored in a double-precision Float is-
1.7976931348623158e+308
to1.7976931348623158e+308
. The smallest allowable double-precision value is2.2250738585072014e--308
.
Optionally, specify Format.
Enter
E
to enable exponential notation. Exponential notation is a floating-point numeric format expressed in two parts: a mantissa and an exponent. The mantissa specifies the digits in the number, and the exponent specifies the magnitude of the number (that is, the position of the decimal point). For example, the numbers345600000
and0.0000987
will be formatted as3456E5
and987E-7
.Enter
F
to enable floating-decimal notation, which will omit trailing zeros and the trailing decimal point if not required. For example, the numbers123.456000
and123.0000
will be formatted as123.456
and123
.
Scanned float literals are double-precision (at least until you store them in a single-precision field).
Configure Integer data types
To configure an Integer data type, specify the Integer scale of the data:
Tiny (1 byte)
Short (2 byte)
Regular (4 byte)
Long (8 byte)
Select or clear Signed to match the format of your data.
Size | Signed | Minimum | Maximum |
---|---|---|---|
Tiny | Y |
|
|
Tiny | N |
|
|
Short | Y |
|
|
Short | N |
|
|
Regular | Y |
|
|
Regular | N |
|
|
Long | Y |
|
|
Long | N |
|
|
Optionally, specify Format using the symbols shown in the table below. If no format is specified, defaults to #.
Format symbol | Meaning |
---|---|
| All digits left of decimal |
| Show N digits here (pad with space) |
| Show N digits here (pad with 0) |
| Show leading + sign if positive |
| Format with thousands separated by commas |
Examples:
Input | Format | Result |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Configure Text data types
Text data types can contain ASCII or EBCDIC. The default is ASCII. For EBSDIC, specify EBCDIC as the Format.
Most text fields have a limit of 100MB.
To configure a Text data type, specify the Maximum # of characters of the field, and whether it is fixed-length or variable-length:
Text variable-length varies in length according to how much data the field contains, up to Max # characters.
Text fixed-length is always the length specified in Max # characters.
Configure Spatial data type
The Spatial data type is used by a variant field that holds one spatial object. A spatial object may be one of:
Point
MultiPoint
Polygon
MultiPolygon
Line
PolyLine
Only specialized tools can make use of spatial data.
Configure Unicode text data types
Unicode data types are variable-length UTF-16 encoded strings. This data type is capable of holding textual data from almost any language. In databases, his data type is often referred to as nvarchar, although in some databases all text is Unicode. Note that most text fields have a limit of 100MB.
To configure a Unicode data type, specify the Maximum # of characters.