Cosmos DB tools
Overview
Azure Cosmos DB is Microsoft's proprietary NoSQL document database, deployed as globally distributed, multi-model database service. Data Management uses the MongoDB API to access Cosmos DB. Like MongoDB, Cosmos DB documents are based on BSON, which can be thought of as a binary JSON with added type support (including timestamp, decimal, and binary types). Cosmos DB supports unique indexes and indexing of array elements, and does not directly support joins.
Data Management's support for Cosmos DB includes:
Cosmos DB Input and Cosmos DB Output tools
Cosmos DB Deleter, Cosmos DB Updater, and Cosmos DB Array Updater tools
Cosmos DB Key Query and Cosmos DB Executor tools
Cosmos DB tool connection settings
Data Management's Cosmos DB tools use shared settings, which allows you to define a single set of configuration properties (typically access credentials) to share across multiple tools in your Data Management Site. You can override these settings on a per-tool basis by opening the Connection settings section on the tool's Properties pane, selecting Override, and specifying values for that specific tool.
To define Cosmos DB shared tool settings
Open the Tools folder under Settings in the repository.
Select the Cosmos DB tab, and then configure the tool properties for your environment.
Property | Description |
---|---|
Connection URI | The Cosmos DB URI connection string (as defined in the MongoDB documentation). |
Explicit username/password | If selected, you must specify an explicit Username and Password (or Key Vault reference). While the username and password can be embedded in the Connection URI string, it is often preferable for security reasons to specify these separately, since the password will be encrypted. |
Block size limit | Controls how many Request Units (RUs) may be sent to the Cosmos DB server at one time. The recommended maximum is 50. |
To configure default shared tool settings from a Cosmos DB tool's Properties pane, open the Connection settings section, and select Edit default settings.
To override Cosmos DB tool shared settings
Select the desired Cosmos DB tool.
Go to the Configuration tab on the Properties pane.
Open the Connection settings section.
Select Override.
Specify new values for the tool.
Cosmos DB Input
The Cosmos DB input tool reads documents from a collection of a Cosmos DB database and sends those documents to its output connector. The tool makes no attempt to interpret the documents; instead the documents are stored in a field of type Document.
Cosmos DB Input tool configuration parameters
The Cosmos DB Input tool has two sets of configuration parameters in addition to the standard execution options.
Configuration
Parameter | Description |
---|---|
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated. |
Collection | Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated. |
JSON query | Optionally, enter a query in JSON form to filter the documents returned. The query string must correspond to MongoDB's specification. Some examples:
Select Find field names to generate a list of field names discovered in existing documents. If you do not enter a query string, the tool will return all documents in the collection (unless a limit is specified on the Options tab). |
Options
Parameter | Description |
---|---|
Sort by | If Sort by is selected, the Cosmos DB server will sort documents on the specified Sort field before returning them. Select Find field names to generate a list of field names discovered in existing documents. Optionally, you may select Sort descending to sort values in descending order. Cosmos DB will not sort more than 100MB of data. To query and sort a large amount of data, use Data Management's Sort tool downstream from the Cosmos DB Input tool. |
Limit records | If selected, enter the maximum number of records to be returned. For test runs on large databases, this can significantly reduce run-time. |
Enable trigger input | |
Only return selected fields | If selected, specify a list of field names to be returned. This will reduce the size and complexity of the returned documents and improve performance. Select Find field names to generate a list of field names discovered in existing documents. |
Configure the Cosmos DB Input tool
Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.
To configure the Cosmos DB Input tool:
Select the Cosmos DB Input tool.
Go to the Configuration tab on the Properties pane.
Optionally, override shared settings:
Open the Connection settings section.
Select Override.
Specify new values for the tool.
Select a Database and Collection.
Optionally, enter a JSON query to filter the documents returned, or leave blank to return all documents in the Collection. You can select Find field names to generate a list of field names discovered in existing documents.
Select the Options tab to configure how document fields are returned, or Enable Trigger input.
Optionally, select the Sample tab.
Select Refresh Sample data to view a sample of the input data.
Optionally, go to the Execution tab, and then set Web service options.
Cosmos DB Output
The Cosmos DB Output tools inserts new documents into a collection, or optionally updates documents that exist and inserts them otherwise (upsert behavior).
Cosmos DB Output tool configuration parameters
The Cosmos DB Output tool has two sets of configuration parameters in addition to the standard execution options.
Configuration
Parameter | Description |
---|---|
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated. |
Collection | Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated. |
Input document field | Select the field from the upstream connection that contains documents to insert. This field must exist and be of type “Document”. |
Processing mode | Select processing mode:
|
Target key field | Key field for documents in the target Database and Collection. In Cosmos DB, the default primary key field is |
Options
Parameter | Description |
---|---|
Enable trigger output | |
Block size | Controls how many records are sent to the Cosmos DB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory. |
Configure the Cosmos DB Output tool
Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.
To configure the Cosmos DB Output tool:
Select the Cosmos DB Output tool.
Go to the Configuration tab on the Properties pane.
Optionally, override shared settings:
Open the Connection settings section.
Select Override.
Specify new values for the tool.
Select a Database and Collection.
Select Processing mode and choose either Create or Upsert:
Create: inserts new documents into the collection. If a document with the same Target key field value already exists (or if there is another unique index violation), the operation fails and the project aborts.
Upsert: attempts to insert new documents into the collection. If a document with the same Target key field value already exists (or if there is another unique index violation), the existing document is replaced by the new document. If Target key field is not the default
_id
, it should be indexed; otherwise the update operation will be inefficient.
Optionally, select Target key field and specify the key field for documents in the target Database and Collection. In Cosmos DB, the default primary key field is
_id
. If you do not explicitly set an_id
field in your documents, Cosmos DB will synthesize a new one for you.Select the Options tab to Enable Trigger output or set Block size.
Optionally, go to the Execution tab, and then set Report options.
Cosmos DB Deleter
The Cosmos DB Deleter tool deletes documents in the target collection that match keys read from the input connection.
Cosmos DB Deleter tool configuration parameters
The Cosmos DB Deleter tool has two sets of configuration parameters in addition to the standard execution options.
Configuration
Parameter | Description |
---|---|
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated. |
Collection | Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated. |
Input key field | Select an Input key field and a Document key field. Input key field values will be read from source documents. The tool will delete all documents in the target collection with Document key field values matching any of the input values. The Reducing duplicate key fields by using an upstream Unique tool may improve performance if there are a large number of duplicate keys. |
Options
Parameter | Description |
---|---|
Enable trigger output | |
Block size | Controls how many delete operations are sent to the Cosmos DB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory. |
Field conversion options |
Configure the Cosmos DB Deleter tool
Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.
To configure the Cosmos DB Deleter tool:
Select the Cosmos DB Deleter tool.
Go to the Configuration tab on the Properties pane.
Optionally, override shared settings:
Open the Connection settings section.
Select Override.
Specify new values for the tool.
Select a Database and Collection.
Select an Input key field.
Enter a Document key field. The tool will delete all documents in the target collection whose Document key field matches any of the input values.
Select the Options tab to Enable Trigger output or set Block size.
Optionally, open the Field conversion section on the Options tab and configure Document database field conversion options. If the conversion options do not match the target database, the key values will not match.
Optionally, go to the Execution tab, and then set Report options.
The _id
field is the natural unique primary key for Cosmos DB collections, and is the usual choice. However, choosing other key fields can support behavior like deleting multiple documents per key (for example, "all documents in this ZIP Code"). You may also need to choose another key if _id
is not the natural key available in the source data.
If there are a great number of duplicate keys, you may improve performance by reducing duplicate key fields with an upstream Unique tool.
Cosmos DB Updater
The Cosmos DB Updater tool alters existing documents in a Cosmos DB collection by replacing values with those from a Document field in the input records. Each update field must be specified separately.
Cosmos DB Updater tool configuration parameters
The Cosmos DB Updater tool has two sets of configuration parameters in addition to the standard execution options:
Configuration
Parameter | Description |
---|---|
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated. |
Collection | Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated. |
Input document field | Select the field from the upstream connection that contains documents containing update values. |
Update key | Select the Update key, which is used to determine which records to update. The |
Update fields | Enter a list of fields that will be copied from the input source to target documents. Select Find field names to generate a list of field names discovered in existing documents. |
Options
Parameter | Description |
---|---|
Include nulls in document | Select to add explicit null values to the documents. |
Block size | Controls how many records are sent to the Cosmos DB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory. |
Enable trigger output |
Configure the Cosmos DB Updater tool
Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.
To configure the Cosmos DB Updater tool:
Select the Cosmos DB Updater tool.
Go to the Configuration tab on the Properties pane.
Optionally, override shared settings:
Open the Connection settings section.
Select Override.
Specify new values for the tool.
Select a Database and Collection.
Select an Input document field.
Enter an Update key, used to determine which records to update. To update the entire document with a new document, use the Cosmos DB Output tool with Processing mode set to Upsert.
Select one or more Update fields that will be copied from the source to target documents. You can select Find field names to populate the list of field names discovered in existing documents.
Select the Options tab to Include nulls in document, Enable Trigger output, or set Block size.
Optionally, go to the Execution tab, and then set Report options.
Cosmos DB Array Updater
The Cosmos DB Array Updater tool alters existing documents in a Cosmos DB collection by adding data to an array as an atomic document database operation (Cosmos DB syntax $push
). Enable the Unique result option to add a value to the target array only if the value does not already exist in the array (Cosmos DB syntax $addToSet
).
Cosmos DB Array Updater tool configuration parameters
The Cosmos DB Array Updater tool has two sets of configuration parameters in addition to the standard execution options.
Configuration
Parameter | Description |
---|---|
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated. |
Collection | Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated. |
Input document field | Select the array field from the upstream connection that contains documents containing update values. |
Update key | Select the update key, which is used to determine which records to update. The |
Target field | Enter one or more fields to update. Select Find field names to generate a list of field names discovered in existing documents. |
Options
Parameter | Description |
---|---|
Unique result | Select to add a value to the target array only if the value does not already exist in the array (Cosmos DB syntax |
Block size | Controls how many records are sent to the Cosmos DB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory. |
Enable trigger output |
Configure the Cosmos DB Array Updater tool
Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.
To configure the Cosmos DB Array Updater tool:
Select the Cosmos DB Array Updater tool.
Go to the Configuration tab on the Properties pane.
Optionally, override shared settings:
Open the Connection settings section.
Select Override.
Specify new values for the tool.
Select a Database and Collection.
Select an Input document field.
Enter an Update key, used to determine which records to update.
Select one or more Target fields to update. You can select Find field names to populate the list of field names discovered in existing documents.
Select the Options tab to Unique result, Enable Trigger output, or set Block size.
Optionally, go to the Execution tab, and then set Report options.
Cosmos DB Key Query
The Cosmos DB Key Query tool returns all documents from the source collection whose key fields match any of the input key field values.
Cosmos DB Key Query tool configuration parameters
The Cosmos DB Key Query tool has two sets of configuration parameters in addition to the standard execution options.
Configuration
Parameter | Description |
---|---|
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated. |
Collection | Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated. |
Input key field | Select an Input key field and Document key field. Select Find field names to generate a list of field names discovered in existing documents. The tool will return all documents in the target collection whose Document key field value matches any of the Input key field values. |
Options
Parameter | Description |
---|---|
Block size | Controls the number of keys that are sent to the Cosmos DB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory. |
Field conversion options |
Configure the Cosmos DB Key Query tool
Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.
To configure the Cosmos DB Key Query tool:
Select the MongoDB Key Query tool.
Go to the Configuration tab on the Properties pane.
Optionally, override shared settings:
Open the Connection settings section.
Select Override.
Specify new values for the tool.
Select a Database and Collection.
Select an Input key field, and then choose a Document key field. You can select Find field names to generate a list of field names discovered in existing documents.
Optionally, select the Options tab to set Block size.
Optionally, open the Field conversion section on the Options tab and configure Document database field conversion options. If the conversion options do not match the target database, the key values will not match.
Optionally, select the Sample tab, and then choose Refresh Sample data to view a sample of the input data.
Optionally, go to the Execution tab, and then set Web service options.
The _id
field is the natural unique primary key for Cosmos DB collections, and is the usual choice. However, choosing other key fields can support behavior like returning multiple documents per key (for example, "all documents in this ZIP Code"). You may also need to choose another key if _id
is not the natural key available in the source data.
Cosmos DB Executor
The Cosmos DB Executor tool can perform the following commands on the configured database:
Drop collection
Clear collection
Drop index on collection
Create index on a collection, specifying field list, Unique, and Sparse options
Cosmos DB Key Query tool configuration parameters
The Cosmos DB Executor tool has two sets of configuration parameters in addition to the standard execution options:
Configuration
Parameter | Description |
---|---|
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated. |
Operation | Select one or more Collection actions:
Select one or more Index actions:
|
Collection | Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated. |
Index name | Name of the index to CREATE or DROP. Due to a bug in Cosmos DB, the index names you define may be changed to default names on CREATE. |
Sparse | Select to create a sparse index. This is good for seldom-used fields because it saves space. |
Unique | Select to create a unique index. This enforces uniqueness across the specified fields. Note that the An index that is both sparse and unique prevents collection from having documents with duplicate values for a field but allows multiple documents that omit the key. |
Fields | Comma-separated list of field names on which to index. Indexes are ascending on all fields. |
Enable shell commands | Select to enter database commands in JSON form, one per line. You can use this to perform operations that are unavailable in Collection actions and Index actions. See MongoDB Database Commands for details. |
Options
Parameter | Description |
---|---|
Enable trigger input |
Configure the Cosmos DB Executor tool
Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.
To configure the Cosmos DB Executor tool
Select the Cosmos DB Executor tool.
Go to the Configuration tab on the Properties pane.
Optionally, override shared settings:
Open the Connection settings section.
Select Override.
Specify new values for the tool.
Select a Database.
Use the Collection actions and/or Index actions grids to configure commands. You may add as many commands as you like.
To drop a collection
In the Collection actions grid:
Select a grid cell under Operation and choose DROP.
Choose the corresponding grid cell under Collection.
Select the collection you want to drop.
To clear a collection
In the Collection actions grid:
Select a grid cell under Operation and choose CLEAR.
Choose the corresponding grid cell under Collection.
Select the collection you want to drop.
To create an index
In the Index actions grid:
Select a grid cell under Operation and choose CREATE.
Choose the corresponding grid cell under Collection.
Select the collection you want to index.
Select the corresponding grid cell under Index name and enter a name for the new index.
Due to a bug in Cosmos DB, the index names you define may be changed to default names on CREATE.
Optionally, select Sparse and/or Unique to select those index types.
Select the corresponding grid cell under Fields and enter one or more field names (separated by commas) to be indexed.
To drop an index
In the Index actions grid:
Select a grid cell under Operation and choose DROP.
Choose the corresponding grid cell under Collection.
Select the collection you want to drop.
Select the corresponding grid cell under Index name and enter a name for the index you want to drop.
Select the corresponding grid cell under Fields and enter one or more field names (separated by commas) whose indexes will be dropped.
Optionally, select Enable shell commands and enter database commands in JSON form, one per line.
Select the Options tab to Enable Trigger input or output.
Optionally, go to the Execution tab, and then set Report options.