Variables
Overview
Data Management is designed on a data flow model. Records flow through the system independent of each other or their environment except where records are explicitly brought together for processing (using tools such as Join or Merge). This data flow model is very powerful, but you may sometimes need to create projects in which record processing is performed with more awareness of other records or other processing paths.
Types of variables
In Data Management projects, there are five kinds of variables that may be used in the tools for different purposes:
System variables: these have their values set automatically by Data Management. You can use them anywhere you would use a normal record field by prefacing the variable with the system. prefix. For example, to access the project start time, use
system.StartTime
. See System variables for a list of available system variables.Site settings variables: these are system-wide read-only text variables you define on the Site settings Variables tab. They are available to all projects running on the Data Management Site server. Once defined, you can access these variables in projects and automations by prefacing the variable name with the settings. prefix.
Record variables: in tools where you can enter expressions—primarily Calculate and Filter, but also Table Lookup and a few others–the variables used in the expressions are actually fields of the records flowing through the tool. So when you reference the variable X in an expression like
X + 10
, you are implicitly referencing the field namedX
in the record that is currently being processed by the tool.Local variables: these are user-defined variables confined to the single Calculate tool in which they are created. To use a local variable named
X
in an expression, prefix the variable name withlocal.
as inlocal.X + 10
. See Defining local variables.Global variables: these are defined at the project level, and may be accessed by any tool in the entire project. To use a global variable named
X
in an expression, prefix the global variable name withglobal.
as inglobal.X + 10
. Like local variables, global variables may have their value assigned by the Calculate tool. However, their usage is much more complex. See Defining global variables and Using global variables.
When to use variables
Use variables any time you need to maintain state or have some memory of earlier records. For example, suppose that you are reading a transaction log file, which contains various kinds of records stored in a variant-field format. Log files like this often have a "start transaction" record with a transaction ID, followed by one or more "detail" records, followed by an "end transaction" record. Because the transaction ID applies to all detail records in the block, you want to "remember" the transaction ID while processing all of the detail records. A local variable can store the transaction ID while it is needed.
Another use for variables is as a custom "accumulator" or summary operation. A local or global variable can store an accumulated result until it is needed.
When not to use variables
You may be tempted to to store a value globally so that all tools have access to the value. But this can pose timing problems. Many Data Management tools perform implicit sorts or record buffering as part of their operations. Thus many records may pass an upstream stage before a downstream stage receives the first record. If you must pass information from upstream processing to a downstream process, it is better to attach it to the record data rather than storing it in a global variable.
An example
You might want to compute a single value (such as "grand total") in one processing stage, and then use that value to compute a per-record value (such as "percentage of grand total"). You could create one processing stream that stores the grand total in a global variable, and a second processing stream to compute the percent-of-total on each record. A better alternative is to simply create a single record containing the grand total, and attach it to the record stream using a "null join"—that is, perform a join, but don't specify any join keys. This attaches the "grand total" value every record, making it available for per-record calculations.
Define local variables
To use local variables, you must first define them using the Calculate tool. Each variable must have a name and a data type, just like the fields of a record. Local variables are visible only within the Calculate tool where they are defined.
To define a local variable:
Select the Calculate tool, and then go to the Variables tab on the Properties pane.
Select the Field box in the Local record grid and type the name of the variable.
Select the Type box and configure the new field so that it is the correct type and size to hold the calculation results. See Configuring Data Types.
Optionally, select the Initial value box and define the initial value of the variable.
Repeat steps 1 through 4 for each additional variable.
Global variables
Project parameters function as a global variables and can be referenced using the special ${...}
syntax. The fundamental difference between a project parameter and a global variable is that a parameter is a variable with a user interface property control. User interface (UI) property controls are used:
When a macro is embedded in a project
and
When a project is embedded in an automation
By contrast, global variables (including variables defined using UI property controls) can be:
Set via a UI property control for macros embedded in projects and projects embedded in automations.
Set using the
-D
option on the command line program.Set using variable mapping from an automation to an embedded project.
Referenced in any tool that has an expression (Calculate and Filter are the most common cases).
Changed by a Calculate tool.
Data Management projects run with a high degree of parallelism and pipelining. Because of this, we do not recommend configuring multiple tools to change and use the same global variable, as results are unpredictable due to timing variation.
Define global variables
Global variables are a special kind of project parameter, and are defined using the project parameters editor.
To define a global variable:
Open a project or macro, and then select Parameters on the Project menu. You can also right-click the project canvas, and then select Project parameters.
Select the Parameters tab on the Properties pane.
Select the create new icon and choose Tab.
Select the tab control you just inserted.
Select the create new icon and choose Variable.
Select the variable you just created, and configure Name, Default value, and Data type.
Global variables and macros
Like other projects, macro parameters are are defined using the project parameters editor. These project parameters are displayed when the macro is embedded in a project. The user interface parameters appear on the macro's property pane when the macro is selected, and the user can directly configure them.
Macro user interface parameters are also global variables
In a macro, the parameters are also global variables. So if you define a parameter named X
, it also defines a global variable named X
, regardless of its control type.
Macros have independent global variables
Macro parameters and the global variables they represent are only used within the context of the macro. This isolates the behavior of macros from the project in which they are contained. This means that if you define a variable/parameter named X
in a macro, and also define a variable/parameter named X
in the project containing the macro, changing one X
does not affect the other X
.
Global variables and expression replacement
You can use global variables in the configuration of any tool setting that accepts text, using the syntax ${expression}
. At configuration time, ${expression}
is replaced with the result of evaluating that expression. The expressions can be simple variables like ${FILENAME}
, or they can be more complex, calling functions like ${FileFromPath(FILENAME)}
.
When concatenating strings in this way, use inline syntax and no +
operator like ${TARGET_DIRECTORY}/${FileFromPath(FILENAME)}
.
Example:
The CSV Input tool has a file path in its properties. Normally you type a file path or browse to a file.
Suppose that you want to embed this project in an automation, and set the input file name using an automation's variable. To do so, first you would define the variable FILENAME
.
Make sure that the variable is long enough to hold any file path (1000
is good), and give it a default value to use when developing the project outside of the automation context (f:/data/file.csv
).
You reference this variable in the CSV Input tool's properties by typing the special ${...}
syntax for the file name.
When the project is configured, ${FILENAME}
will be evaluated and replaced with the contents of the FILENAME
variable.
This variable can be set when the project is run from within an automation by mapping the variable from an automation variable or expression, defined in the project step. Suppose we've saved the project to the repository, created an automation, defined a variable in the automation named AUTO_FILENAME
, and embedded the project in the automation as a step. The project step then lets you map the automation variable to the project variable.
This assumes, of course, that the automation has performed the correct steps to populate its AUTO_FILENAME
variable correctly.
This variable can also be set as a command-line argument to rpdm_cmd
:
rpdm_cmd -project=repository:///projects/readfile -DFILENAME=f:/data/mydata.csv
Expression replacement happens at configuration time!
If you use the ${...}
syntax in your project, be aware that these are evaluated and replaced before the project runs. This is necessary because configuration properties can alter the logic and schema of the project, and so must be set before the project starts. Thus changing a global variable will have no effect on any ${...}
instances used in your project. For example, you cannot use a Calculate tool to change a global variable FILENAME
and expect that change to alter the behavior of an output tool that references ${FILENAME}
. Instead, use an automation to map variables in.
System variables
To use a system variable in an expression, preface the variable name with the system. prefix.
Variable | Description | Type |
---|---|---|
| Full URI path to the automation. For example, if an automation is located in the folder In Data Management versions prior to 9.0, these variables contained only the "native path" part of a variable, regardless of its URI scheme. For example, if an automation was located in the folder These variables now also contain full URIs for local file system paths (for example, |
|
| Name of the project or automation. |
|
| Contains the name of the cluster master. |
|
| Contains a comma-separated list of the names of all enabled workers in the cluster. |
|
| Number of CPU cores enabled on the Execution Server. |
|
| Type of distributed file system. (Unused in 9.5 and later) |
|
| Hostname of the Execution Server. |
|
| OS name of the Execution Server. This may not be the same as the Execution Server name, depending on what name was specified during installation. |
|
| Full URI path to the Data Management installation location. |
|
| Date-and-time that the project was loaded. |
|
| ID of the log being created by the project. |
|
| Execution Server operating system. |
|
| Operating system process ID of the project or automation. Used to create uniquely-named files and tables, using the |
|
| Full URI path to the project. For example, if a project is located in the folder In Data Management versions prior to 9.0, these variables contained only the "native path" part of a variable, regardless of its URI scheme. For example, if an project was located in the folder These variables now also contain full URIs for local file system paths (for example, |
|
| Name of the project. |
|
| Size of field that holds generated IDs from tools like Number Records and Generate Sequence. |
|
| Time that the project run started. |
|
| Date that the project run started. |
|
| Date-and-time that the project run started. |
|
| Full URI path to the configured temp space. |
|
| User running the project. |
|
| Data Management version string, for example |
|
| Contains a number uniquely identifying the Worker amongst all Workers in a Cluster. |
|