Expressions
Overview
Expressions are equations that analyze data from one or more fields. Expressions can perform calculations, manipulate text, or test data. The resulting value can then be assigned to a field, used in another expression, or used to direct the flow of data through the Data Management project.
Data Management has a rich expression language and a set of useful functions that makes it easy for you to manipulate your data. Expressions are used in the Calculate tool to assign new values to fields, and in the Filter tool to evaluate which records to keep.
The expression editor's autocomplete feature displays a pop-up list of fields, variables, and functions as you type.
If you calculate or assign a field value that does not match the field's defined type, the result will be an error value. An expression producing an invalid result (such as a number too large to fit in the field) will also appear as an error value.
An error value viewed in the Data Viewer tool's output looks like <<Error>>. When an error value is converted to an output file or database it is changed to a Null or blank value (most data formats do not have an explicit Error representation).
Use the function IsValid to check that values you want to calculate with are valid values. You can also replace invalid values with a default value using a conditional expression.
For example, IF IsValid(FIELD) THEN FIELD ELSE 0 ENDIF
functions like Trim strip off blanks, sometimes making the resulting value shorter than the field to which it is assigned. The length of text expressions varies depends on the expression and the values. However, once you assign an expression result to a fixed-length field, the value is truncated to the field size and padded with blanks.
Fields used in functions must be of the correct type. For example, a text field cannot be directly passed to a function like Abs which expects a numeric value. However, you can use a conversion function to force the correct type: Abs(AsInteger(LONGITUDE))
.
You can also use the Calculate tool to create a new field of the desired type, assigning its value from the old field, thereby changing the type of the value.
In the absence of explicit type information, Data Management’s expression language attempts to infer a token’s type. When it encounters a numeric value containing a decimal point, it concludes that value is a double suitable for floating-point arithmetic. That might yield rounding errors in the number’s least significant digit. If fixed-point precision is required (such as for currency calculations), first express the value as a string (e.g., 9999999999999.0
). You can assign that string directly to a Decimal field or use it in an expression after passing it through the function AsDecimal.
Syntax overview
Expressions calculate values in a specific order known as the syntax. The syntax of the expression describes the process of the calculation. It includes the elements to be calculated—the operands—and the calculation operators. Each operand can be a value that does not change (a constant value) or an identifier (a field reference, label, or name). Operators can perform arithmetic, comparison, textual, and logical operations. You can also use predefined expressions (called functions) to perform a variety of useful calculations and transformation. Additionally, the Data Management expression language supports IF...THEN...ELSE
statements, which you can use to create conditional expressions.
You create an expression by combining operators and operands to produce the result you want. For example, the following expression concatenates three fields into a single field, separating the name components with spaces: First_Name + " " + Middle_Name + " " + Last_Name
.
Expressions can be used to perform more complex data transformations. The expression below translates ranges of a numeric value into names representing those ranges.
IF X >= 100 AND X <= 1000 THEN
"SMALL"
ELSEIF X >= 1001 AND X <= 5000 THEN
"MEDIUM"
ELSEIF X >= 5001 AND X <= 10000 THEN
"LARGE"
ELSE
"UNKNOWN"
ENDIF
Expressions can also be used to select or filter records. The following expression chooses all records with dates falling on the second Tuesday of any month:
Day(DATE) >= 8 AND Day(DATE) <= 14 AND DayOfWeek(DATE) = 3
Syntax basics
Data Management's expression language consists of many different elements, each with a simple set of rules that you must follow to write correct expressions. This set of rules is known as the syntax of the expression language.
Basic elements
The first level of syntax consists of elements that you are probably already familiar with—strings, numbers, and fields.
Element | Description | Examples |
---|---|---|
String | A textual string surrounded by double-quotes. Certain special character sequences are interpreted, even when embedded in double-quotes:
|
|
Number | A series of digits with an optional sign, decimal point, and exponent. |
|
Field | The name of the field. The field name must be surrounded by |
|
Functions
Data Management has a set of built-in functions that are used to perform common tasks. They include string manipulation, pattern-matching, mathematics, and date/time processing. To invoke a function, you type the name of the function, followed by a list of arguments enclosed in parentheses. In general you call a function using a syntax likeFunctionName(argument1, argument2, ...)
where the number and type of arguments is determined by the function being called. Every function performs its specific computation on its arguments and returns a result value. For example, in: Substring(NAME, 1, 4)
the function Substring requires that the first argument is text, and the second and third arguments are numbers. It extracts a substring from the first argument and returns it.
You can think of function calls as a placeholder for the results—after the function completes its work, the function call is "replaced" in the expression by its result, and the expression evaluation continues using that result in the surrounding expression. For example, if the value of the NAME
field is "John Doe"
and you use the expression Substring(NAME, 1, 4) + "X"
, the expression evaluator will first process the Substring function and return "John"
. It will then evaluate "John" + "X"
and yield JohnX
.
You can use functions as the arguments to other functions. For example: IsLike( Substring(NAME, 1, 2), "%X")
first extracts the first two characters from NAME
, and then checks to see if that two-character string ends with an X
.
Operators
Data Management supports the arithmetic, string, and comparison operators that you are familiar with.
Operators | Meaning | Examples |
---|---|---|
| Arithmetic operations on numeric values |
|
| Concatenation of text |
|
| Logical disjunction and conjunction |
|
| Equal or not-equal |
|
| Relational comparison |
|
See the operators reference for details of usage and the types that can be used.
IF/THEN/ELSE/ENDIF
Data Management supports conditional expressions using IF/THEN/ELSE/ENDIF
. In Data Management, you use IF/THEN/ELSE/ENDIF
as a "value selector." This contrasts with programming languages such as Visual Basic, which use conditionals to direct the flow of execution of a program.
Think of IF/THEN/ELSE/ENDIF
as a means to select one of many sub-expressions as values.
For example:
IF INCOME < 10000 THEN "A"
ELSE "B"
ENDIF
This results in one of two values:
If
INCOME
is<100000
, use"A"
If
INCOME
is>=100000
, use"B"
Some examples of how the INCOME
field relates to the result value.
Value of INCOME | Result value |
---|---|
|
|
|
|
|
|
|
|
You can extend IF/THEN/ELSE/ENDIF
to select more than two result values, by using ELSEIF
. Add an extra ELSEIF
clause between the THEN
and ELSE
for each additional value you want.
For example:
IF INCOME < 10000 THEN "A"
ELSEIF INCOME < 25000 THEN "B
ELSEIF INCOME < 50000 THEN "C"
ELSE "D"
ENDIF
Find some examples of how the INCOME
field relates to the result value below.
Value of INCOME | Result value |
---|---|
|
|
|
|
|
|
|
|
The clauses of IF/THEN/ELSEIF/ELSE/ENDIF
are evaluated in order, so the result for the first true condition encountered is used, regardless of the rest of the conditions. The ELSE
result value is only used if all the conditions are false.
For example:
IF INCOME < 100000 THEN "A"
ELSEIF INCOME < 10000 THEN "B"
ELSE "C"
ENDIF
This will never result in "B"
, because the first condition is always tested first.
Operators reference
Operators specify the type of calculation that you want to perform on the elements of an expression. Data Management's expression language includes four different types of operators: arithmetic, comparison, text, and logical operators.
Arithmetic operators
Arithmetic operators perform basic mathematical operations such as addition, subtraction, or multiplication; combine numbers; and produce numeric results.
Arithmetic operator | Meaning | Example |
---|---|---|
| Addition |
|
| Subtraction |
|
| Negation |
|
| Multiplication |
|
| Division |
|
| Modulus (integer remainder) |
|
The modulus operator (%
) can only be used with two integers (fixed-point numbers with a scale of 0).
Comparison operators
Comparison operators compare two values or expressions and then produce the logical value TRUE
or FALSE
.
Comparison operator | Meaning | Example |
---|---|---|
| Equal to |
|
| Greater than |
|
| Less than |
|
| Greater than or equal to |
|
| Less than or equal to |
|
| Not equal to |
|
Text operators
The operator +
concatenates one or more text values to produce a single piece of text.
Text operator | Meaning | Example |
| Concatenates two values to produce one continuous text value. |
|
Logical operators
Logical operators (sometimes known as Boolean operators) operate on Boolean-valued expressions to form a new Boolean-valued expression. The AND
and OR
operators combine two expressions, whereas the NOT
operator works on a single expression.
Logical operator | Meaning | Example |
---|---|---|
| Returns |
|
| Returns |
|
| Returns the opposite of the following expression. Note use of parentheses; |
|
Precedence
When more than one operation occurs in an expression, each operation is performed in a predetermined order called operator precedence. Data Management evaluates expression operations in the following order:
Constant values, identifiers, functions, expressions in parentheses, and
IF...THEN...ELSE
statementsUnary minus (
--
)Logical inversion operations (
!
,NOT
)Multiplication, division, and modulus operations (
*
,/
,%
)Addition and subtraction operations (
+
,--
)Comparison operations (
=
,<
,>
,<=
,>=
,!=
)Logical operations (
AND
,OR
)
Examples
Expression | Equivalent to | Remarks |
---|---|---|
|
|
|
|
|
|
|
| Females over 40. |
|
| Males over 40. |
|
| Anyone that is not a female over 40. Explicit parentheses are required to get this behavior |
|
|
|
If you have any doubt about the precedence-grouping of an expression, or think that someone else might not understand it, use explicit parentheses.
Expression types and validity
The type of an expression's result depends on the kind of expression and the types of the expression's inputs.
Conditional expressions
Conditional expressions produce results of the same type as the dependent arguments of their THEN
and ELSE
clauses. The THEN
and ELSE
dependent arguments can be either the same type, or a mixture of Integer
and Float
types. In the latter case, the result type is Float
.
While this covers the majority of use cases, it is possible to write conditional expressions whose THEN
and ELSE
actions have incompatible return types. In such cases, Data Management converts them to the simplest common type. That is, the closest type to which both can be converted. Ultimately, this will be a Text
type if no more suitable type can be found.
Logical expressions
Logical expressions—those using AND
, OR
, and NOT
operators—produce results of type Boolean.
Relational expressions
Relational expressions—those comparing two values or expressions—result in Boolean values. Relational expressions can compare most types, with the restrictions noted below.
Comparison operator | Argument types |
---|---|
| Arguments may be any type except Binary. Both arguments must be of the same type, except that Integers and Floats are comparable. Elapsed time cannot be compare to other time formats. Text values of unequal lengths can be compared; all else being equal, shorter strings come before longer ones. |
| Same as above, except that arguments cannot be of type Boolean. |
Additive expressions
The result of an additive expression depends on the types of its arguments. The following table shows the types that can be combined in an additive expression, which operations are valid, and the resulting type.
Left Type | Right Type | Operation | Resulting Type |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Multiplicative expression
The result of a multiplicative expression depends on the types of its arguments. The following table shows the types that can be combined in an multiplicative expression, which operations are valid, and the resulting type.
Left Type | Right Type | Operation | Resulting Type |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Comments
Expressions may contain C++ style comments. The expression text:
"some text" /*embedded comment*/ + " and "
// single-line comment
+ "more text " // comment to end of line
+ " and " /* multi-line comment
Follows because I have a lot to say about this */
+ "even more text"
is equivalent to the expression:
"some text” + " and "
+ "more text "
+ " and "
+ "even more text"