Expressions
Overview
Expressions are equations that analyze data from one or more fields. Expressions can perform calculations, manipulate text, or test data. The resulting value can then be assigned to a field, used in another expression, or used to direct the flow of data through the Data Management project.
Data Management has a rich expression language and a set of useful functions that makes it easy for you to manipulate your data. Expressions are used in the Calculate tool to assign new values to fields, and in the Filter tool to evaluate which records to keep.
The expression editor's autocomplete feature displays a pop-up list of fields, variables, and functions as you type.
If you calculate or assign a field value that does not match the field's defined type, the result will be an error value. An expression producing an invalid result (such as a number too large to fit in the field) will also appear as an error value.
An error value viewed in the Data Viewer tool's output looks like <<Error>>. When an error value is converted to an output file or database it is changed to a Null or blank value. (Most data formats do not have an explicit Error representation.)
Use the function IsValid to check that values you want to calculate with are valid values. You can also replace invalid values with a default value using a conditional expression.
For example: IF IsValid(FIELD) THEN FIELD ELSE 0 ENDIF
Functions like Trim strip off blanks, sometimes making the resulting value shorter than the field to which it is assigned. The length of text expressions varies depends on the expression and the values. However, once you assign an expression result to a fixed-length field, the value is truncated to the field size and padded with blanks.
Fields used in functions must be of the correct type. For example, a text field cannot be directly passed to a function like Abs which expects a numeric value. However, you can use a conversion function to force the correct type: Abs(AsInteger(LONGITUDE))
You can also use the Calculate tool to create a new field of the desired type, assigning its value from the old field, thereby changing the type of the value.
In the absence of explicit type information, Data Management’s expression language attempts to infer a token’s type. When it encounters a numeric value containing a decimal point, it concludes that value is a double suitable for floating-point arithmetic. That might yield rounding errors in the number’s least significant digit. If fixed-point precision is required (such as for currency calculations), first express the value as a string (e.g. 9999999999999.0). You can assign that string directly to a Decimal field or use it in an expression after passing it through the function AsDecimal.
Syntax overview
Expressions calculate values in a specific order known as the syntax. The syntax of the expression describes the process of the calculation. It includes the elements to be calculated—the operands—and the calculation operators. Each operand can be a value that does not change (a constant value) or an identifier (a field reference, label, or name). Operators can perform arithmetic, comparison, textual, and logical operations. You can also use predefined expressions (called functions) to perform a variety of useful calculations and transformation. Additionally, the Data Management expression language supports IF...THEN...ELSE statements, which you can use to create conditional expressions.
You create an expression by combining operators and operands to produce the result you want. For example, the following expression concatenates three fields into a single field, separating the name components with spaces:
First_Name + " " + Middle_Name + " " + Last_Name
Expressions can be used to perform more complex data transformations. The expression below translates ranges of a numeric value into names representing those ranges:
IF X >= 100 AND X <= 1000 THEN
"SMALL"
ELSEIF X >= 1001 AND X <= 5000 THEN
"MEDIUM"
ELSEIF X >= 5001 AND X <= 10000 THEN
"LARGE"
ELSE
"UNKNOWN"
ENDIF
Expressions can also be used to select or filter records. The following expression chooses all records with dates falling on the second Tuesday of any month:
Day(DATE) >= 8 AND Day(DATE) <= 14 AND DayOfWeek(DATE) = 3
Syntax basics
Data Management's expression language consists of many different elements, each with a simple set of rules that you must follow to write correct expressions. This set of rules is known as the syntax of the expression language.
Basic elements
The first level of syntax consists of elements that you are probably already familiar with—strings, numbers, and fields:
Element | Description | Examples |
---|---|---|
String | A textual string surrounded by double-quotes. Certain special character sequences are interpreted, even when embedded in double-quotes: \r carriage return | "Data Management" |
Number | A series of digits with an optional sign, decimal point, and exponent | 3.14159 –100.00 1.2345e+10 |
Field | The name of the field. The field name must be surrounded by [ ] if it contains spaces or punctuation, or does not start with a letter. The value in an expression comes from the field in the current record. | ADDRESS [FIRST NAME] |
Functions
Data Management has a set of built-in functions that are used to perform common tasks. They include string manipulation, pattern-matching, mathematics, and date/time processing. To invoke a function, you type the name of the function, followed by a list of arguments enclosed in parentheses. In general you call a function using a syntax likeFunctionName(argument1, argument2, ...)
where the number and type of arguments is determined by the function being called. Every function performs its specific computation on its arguments and returns a result value. For example, in: Substring(NAME, 1, 4)
the function Substring requires that the first argument is text, and the second and third arguments are numbers. It extracts a substring from the first argument and returns it.
You can think of function calls as a placeholder for the results—after the function completes its work, the function call is "replaced" in the expression by its result, and the expression evaluation continues using that result in the surrounding expression. For example, if the value of the NAME field is "John Doe" and you use the expression Substring(NAME, 1, 4) + "X"
the expression evaluator will first process the Substring function and return "John." It will then evaluate "John" + "X"
and yield "JohnX."
You can use functions as the arguments to other functions. For example: IsLike( Substring(NAME, 1, 2), "%X")
first extracts the first two characters from NAME, and then checks to see if that two-character string ends with an "X."
Operators
Data Management supports the arithmetic, string, and comparison operators that you are familiar with:
Operators | Meaning | Examples |
---|---|---|
+ – * / % | Arithmetic operations on numeric values | 3 + 4 1000 * SALARY |
+ | Concatenation of text | "John" + NAME |
AND, OR | Logical disjunction and conjunction | SALARY > 50000 AND SALARY < 100000 |
= != | Equal or not-equal | NAME != "John" |
< <= > | Relational comparison | NAME < "M" |
See the operators reference for details of usage and the types that can be used.
IF/THEN/ELSE/ENDIF
Data Management supports conditional expressions using IF/THEN/ELSE/ENDIF. In Data Management, you use IF/THEN/ELSE/ENDIF as a "value selector." This contrasts with programming languages such as Visual Basic, which use conditionals to direct the flow of execution of a program.
Think of IF/THEN/ELSE/ENDIF as a means to select one of many sub-expressions as values. For example:
IF INCOME < 10000 THEN "A"
ELSE "B"
ENDIF
This results in one of two values:
If INCOME is <100000, use “A”
If INCOME is >=100000, use “B”
Some examples of how the INCOME field relates to the result value:
Value of INCOME | Result value |
---|---|
7500 | "A" |
10000 | "B" |
2000 | "A" |
122500 | "B" |
You can extend IF/THEN/ELSE/ENDIF to select more than two result values, by using ELSEIF. Add an extra ELSEIF clause between the THEN and ELSE for each additional value you want. For example:
IF INCOME < 10000 THEN "A"
ELSEIF INCOME < 25000 THEN "B
ELSEIF INCOME < 50000 THEN "C"
ELSE "D"
ENDIF
Some examples of how the INCOME field relates to the result value:
Value of INCOME | Result value |
---|---|
40000 | "C" |
100000 | "D" |
0 | "A" |
9999 | "A" |
Note that the clauses of IF/THEN/ELSEIF/ELSE/ENDIF are evaluated in order, so the result for the first true condition encountered is used, regardless of the rest of the conditions. The ELSE result value is only used if all the conditions are false. For example:
IF INCOME < 100000 THEN "A"
ELSEIF INCOME < 10000 THEN "B"
ELSE "C"
ENDIF
This will never result in "B", because the first condition is always tested first.
Operators reference
Operators specify the type of calculation that you want to perform on the elements of an expression. Data Management's expression language includes four different types of operators: arithmetic, comparison, text, and logical operators.
Arithmetic operators
Arithmetic operators perform basic mathematical operations such as addition, subtraction, or multiplication; combine numbers; and produce numeric results.
Arithmetic operator | Meaning | Example |
---|---|---|
+ | Addition | 2 + 2 |
- | Subtraction | 2 - 1 - 1 |
- | Negation | -VALUE |
* | Multiplication | 2 * 2 |
/ | Division | 4 / 2 |
% | Modulus ( integer remainder) | 103 % 10 |
The modulus operator (%) can only be used with two integers (fixed-point numbers with a scale of 0).
Comparison operators
Comparison operators compare two values or expressions and then produce the logical value TRUE or FALSE.
Comparison operator | Meaning | Example |
---|---|---|
= | Equal to | GENDER = "M" |
> | Greater than | AGE > 21 |
< | Less than | INCOME < 60000 |
>= | Greater than or equal to | SALES >= 1000 |
<= | Less than or equal to | EXPENSES <= INCOME |
!= | Not equal to | ADDRESS1 != ADDRESS2 |
Text operators
The operator "+" concatenates one or more text values to produce a single piece of text.
Text operator | Meaning | Example |
+ | Concatenates two values to produce one continuous text value | John + Henry (produces JohnHenry) |
Logical operators
Logical operators (sometimes known as Boolean operators) operate on Boolean-valued expressions to form a new Boolean-valued expression. The AND and OR operators combine two expressions, whereas the NOT operator works on a single expression.
Logical operator | Meaning | Example |
---|---|---|
AND | Returns TRUE if left-hand and right-hand expression are both TRUE, otherwise returns FALSE | GENDER = "M" AND AGE >= 40 |
OR | Returns TRUE if left-hand expression is TRUE or right-hand expression is TRUE. Returns FALSE if both are FALSE. | NAME = ACME OR NAME = IBM |
NOT | Returns the opposite of the following expression. Note use of parentheses; NOT is higher precedence than AND. | NOT (GENDER = "F" AND AGE > 1) |
Precedence
When more than one operation occurs in an expression, each operation is performed in a predetermined order called operator precedence. Data Management evaluates expression operations in the following order:
Constant values, identifiers, functions, expressions in parentheses, and IF...THEN...ELSE statements
Unary minus (–)
Logical inversion operations (!, NOT)
Multiplication, division, and modulus operations (*, /, %)
Addition and subtraction operations (+,–)
Comparison operations (=, <, >, <=, >=, !=)
Logical operations (AND, OR)
Examples
Expression | Equivalent to | Remarks |
---|---|---|
3 + 4 * 5 | 3 + (4 * 5) | * is higher precedence than + |
VALUE + 2 = SIZE | (VALUE + 2) = SIZE | + is higher precedence than = |
SEX = "F" AND AGE > 40 | (SEX = "F") AND (AGE > 40) | Females over 40. =,> are higher precedence than AND |
NOT SEX = "F" AND AGE > 40 | (NOT (SEX = "F")) AND (AGE > 40) | Males over 40. NOT is higher precedence than AND |
NOT (SEX = "F" AND AGE > 40) | NOT ((SEX = "F") AND (AGE > 40)) | Anyone that is not a female over 40. Explicit parentheses are required to get this behavior. |
IF VALUE != 0 THEN VALUE / 10 ELSE Random(10) ENDIF | IF (VALUE != 0) THEN (VALUE / 10) ELSE (Random(10)) ENDIF | IF/THEN/ELSE/ENDIF are low-precedence, so embedded expressions need no parentheses. |
If you have any doubt about the precedence-grouping of an expression, or think that someone else might not understand it, use explicit parentheses.
Expression types and validity
The type of an expression's result depends on the kind of expression and the types of the expression's inputs:
Conditional expressions
Conditional expressions produce results of the same type as the dependent arguments of their Then and Else clauses. The Then and Else dependent arguments can be either the same type, or a mixture of Integer and Float types. In the latter case, the result type is Float.
Logical expressions
Logical expressions—those using AND, OR, and NOT operators—produce results of type Boolean.
Relational expressions
Relational expressions—those comparing two values or expressions—result in Boolean values. Relational expressions can compare most types, with the restrictions noted below:
Comparison operator | Argument types |
---|---|
= | Arguments may be any type except Binary. Both arguments must be of the same type, except that Integers and Floats are comparable. Elapsed time cannot be compare to other time formats. Text values of unequal lengths can be compared; all else being equal, shorter strings come before longer ones. "JOH" comes before "JOHN" but "JON" is after "JOHN". |
< | Same as above, except that arguments cannot be of type Boolean. |
Additive expressions
The result of an additive expression depends on the types of its arguments. The following table shows the types that can be combined in an additive expression, which operations are valid, and the resulting type:
Left Type | Right Type | Operation | Resulting Type |
---|---|---|---|
Text | Text | + (add) | Text—the concatenation of the arguments. If the arguments are fixed-length text fields, then any leading or trailing blanks contained in the arguments will also be in the result. Use Trim() to avoid that, or first convert the arguments to variable-length text fields. Note that most text fields have a limit of 100MB. |
Integer | Integer | +, - (add, subtract) | Integer capable of holding all values that the Left and Right types could hold. |
Integer | Float | +, - (add, subtract) | Float |
Float | Integer | +, - (add, subtract) | Float |
Date | Date | - (subtract) | Integer indicating days difference. Size of Integer is eight and scale is zero. |
Date | Integer, Float | +, - (add, subtract) | Date—the Left date value advanced (or decremented) the given number of days. Fractional days are ignored. |
Integer, Float | Date | + (add) | Date—the Right date value advanced the given number of days. Fractional days are ignored. |
Time | Time | - (subtract) | Integer—size of nine and scale of three, indicating thousandths of a second difference between the two times. If one date is Elapsed time format, both must be Elapsed time format. |
Time | Integer, Float | +, - (add, subtract) | Time—the Left Time value advanced (or decremented) by the Right value in seconds. Fractional seconds are processed down to thousandths of a second precision. Non-elapsed times will "wrap around" at midnight. |
Integer, Float | Time | + (add) | Time—the Right time value advanced by the Left value in seconds. Fractional seconds are processed down to thousandths of a second precision. Non-elapsed times will "wrap around" at midnight. |
DateTime | DateTime | - (subtract) | Integer—size of fifteen and scale of three, indicating thousandths of a second difference between the two date-times. |
DateTime | Integer, Float | +, - (add, subtract) | DateTime—the Left DateTime value advanced (or decremented) by the Right value in seconds. Fractional seconds are processed down to thousandths of a second precision. |
Integer, Float | DateTime | + (add) | DateTime—the Right time value advanced by the Left value in seconds. Fractional seconds are processed down to thousandths of a second precision. |
Multiplicative expression
The result of a multiplicative expression depends on the types of its arguments. The following table shows the types that can be combined in an multiplicative expression, which operations are valid, and the resulting type:
Left Type | Right Type | Operation | Resulting Type |
---|---|---|---|
Integer | Integer | * (multiply) | Integer—size and scale are the sum of the sizes and scales of the arguments, so that any possible result can be stored. |
Integer | Integer | / (divide) | Integer—size equal to the left argument's size plus the right argument's scale. Scale is equal to the left argument's scale. |
Float | Integer, Float | * (multiply) | Float |
Float | Integer, Float | / (divide) | Float |
Comments
Expressions may contain C++ style comments. The expression text:
"some text" /*embedded comment*/ + " and "
// single-line comment
+ "more text " // comment to end of line
+ " and " /* multi-line comment
Follows because I have a lot to say about this */
+ "even more text"
is equivalent to the expression:
"some text” + " and "
+ "more text "
+ " and "
+ "even more text"