Skip to main content
Skip table of contents

Cosmos DB tools

Overview

Azure Cosmos DB is Microsoft's proprietary NoSQL document database, deployed as globally distributed, multi-model database service. Data Management uses the MongoDB API to access Cosmos DB. Like MongoDB, Cosmos DB documents are based on BSON, which can be thought of as a binary JSON with added type support (including timestamp, decimal, and binary types). Cosmos DB supports unique indexes and indexing of array elements, and does not directly support joins.

Data Management's support for Cosmos DB includes:

Cosmos DB tool connection settings

Data Management's Cosmos DB tools use shared settings, which allows you to define a single set of configuration properties (typically access credentials) to share across multiple tools in your Data Management Site. You can override these settings on a per-tool basis by opening the Connection settings section on the tool's Properties pane, selecting Override, and specifying values for that specific tool.

To define Cosmos DB shared tool settings

  1. Open the Tools folder under Settings in the repository.

  2. Select the Cosmos DB tab, and then configure the tool properties for your environment.

Property

Description

Connection URI

The Cosmos DB URI connection string (as defined in the MongoDB documentation).

Explicit username/password

If selected, you must specify an explicit Username and Password (or Key Vault reference). While the username and password can be embedded in the Connection URI string, it is often preferable for security reasons to specify these separately, since the password will be encrypted.

Block size limit

Controls how many Request Units (RUs) may be sent to the Cosmos DB server at one time. The recommended maximum is 50.

To configure default shared tool settings from a Cosmos DB tool's Properties pane, open the Connection settings section, and select Edit default settings.

To override Cosmos DB tool shared settings

  1. Select the desired Cosmos DB tool.

  2. Go to the Configuration tab on the Properties pane.

  3. Open the Connection settings section.

  4. Select Override.

  5. Specify new values for the tool.

Cosmos DB Input

The Cosmos DB input tool reads documents from a collection of a Cosmos DB database and sends those documents to its output connector. The tool makes no attempt to interpret the documents; instead the documents are stored in a field of type Document.

Cosmos DB Input tool configuration parameters

The Cosmos DB Input tool has two sets of configuration parameters in addition to the standard execution options.

Configuration

Parameter

Description

Override connection settings

See Cosmos DB tool shared settings.

Database

Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated.

Collection

Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated.

JSON query

Optionally, enter a query in JSON form to filter the documents returned. The query string must correspond to MongoDB's specification. Some examples:

  • Equality on zip_code field: { "zip_code":"02134" }

  • Names starting with "Z": {"name": {"$regex":"Z.*" }

  • Salary greater than or equal to 10000: { "salary" : { "$gte" : 10000 } }

Select Find field names to generate a list of field names discovered in existing documents.

If you do not enter a query string, the tool will return all documents in the collection (unless a limit is specified on the Options tab).

Options

Parameter

Description

Sort by
Sort field
Sort descending

If Sort by is selected, the Cosmos DB server will sort documents on the specified Sort field before returning them. Select Find field names to generate a list of field names discovered in existing documents. Optionally, you may select Sort descending to sort values in descending order.

Cosmos DB will not sort more than 100MB of data. To query and sort a large amount of data, use Data Management's Sort tool downstream from the Cosmos DB Input tool.

Limit records
Process only the first

If selected, enter the maximum number of records to be returned. For test runs on large databases, this can significantly reduce run-time.

Enable trigger input

See Trigger input and output.

Only return selected fields

If selected, specify a list of field names to be returned. This will reduce the size and complexity of the returned documents and improve performance. Select Find field names to generate a list of field names discovered in existing documents.

Configure the Cosmos DB Input tool

Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.

To configure the Cosmos DB Input tool:

  1. Select the Cosmos DB Input tool.

  2. Go to the Configuration tab on the Properties pane.

  3. Optionally, override shared settings:

    1. Open the Connection settings section.

    2. Select Override.

    3. Specify new values for the tool.

  4. Select a Database and Collection.

  5. Optionally, enter a JSON query to filter the documents returned, or leave blank to return all documents in the Collection. You can select Find field names to generate a list of field names discovered in existing documents.

  6. Select the Options tab to configure how document fields are returned, or Enable Trigger input.

  7. Optionally, select the Sample tab.

  8. Select Refresh Sample data to view a sample of the input data.

  9. Optionally, go to the Execution tab, and then set Web service options.

Cosmos DB Output

The Cosmos DB Output tools inserts new documents into a collection, or optionally updates documents that exist and inserts them otherwise (upsert behavior).

Cosmos DB Output tool configuration parameters

The Cosmos DB Output tool has two sets of configuration parameters in addition to the standard execution options.

Configuration

Parameter

Description

Override connection settings

See Cosmos DB tool shared settings.

Database

Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated.

Collection

Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated.

Input document field

Select the field from the upstream connection that contains documents to insert. This field must exist and be of type “Document”.

Processing mode

Select processing mode:

  • Create: inserts new documents into the collection. If a document with the same Target key field value already exists (or if there is another unique index violation), the operation fails and the project aborts.

  • Upsert: attempts to insert new documents into the collection. If a document with the same Target key field value already exists (or if there is another unique index violation), the existing document is replaced by the new document. If Target key field is not the default _id, it should be indexed; otherwise the update operation will be inefficient.

Target key field

Key field for documents in the target Database and Collection. In Cosmos DB, the default primary key field is _id. We suggest that you use this rather than creating another key field.

Options

Parameter

Description

Enable trigger output

See Trigger input and output.

Block size

Controls how many records are sent to the Cosmos DB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory.

Configure the Cosmos DB Output tool

Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.

To configure the Cosmos DB Output tool:

  1. Select the Cosmos DB Output tool.

  2. Go to the Configuration tab on the Properties pane.

  3. Optionally, override shared settings:

    1. Open the Connection settings section.

    2. Select Override.

    3. Specify new values for the tool.

  4. Select a Database and Collection.

  5. Select Processing mode and choose either Create or Upsert:

    • Create: inserts new documents into the collection. If a document with the same Target key field value already exists (or if there is another unique index violation), the operation fails and the project aborts.

    • Upsert: attempts to insert new documents into the collection. If a document with the same Target key field value already exists (or if there is another unique index violation), the existing document is replaced by the new document. If Target key field is not the default _id, it should be indexed; otherwise the update operation will be inefficient.

  6. Optionally, select Target key field and specify the key field for documents in the target Database and Collection. In Cosmos DB, the default primary key field is _id. If you do not explicitly set an _id field in your documents, Cosmos DB will synthesize a new one for you.

  7. Select the Options tab to Enable Trigger output or set Block size.

  8. Optionally, go to the Execution tab, and then set Report options.

Cosmos DB Deleter

The Cosmos DB Deleter tool deletes documents in the target collection that match keys read from the input connection.

Cosmos DB Deleter tool configuration parameters

The Cosmos DB Deleter tool has two sets of configuration parameters in addition to the standard execution options.

Configuration

Parameter

Description

Override connection settings

See Cosmos DB tool shared settings.

Database

Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated.

Collection

Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated.

Input key field
Document key field

Select an Input key field and a Document key field. Input key field values will be read from source documents. The tool will delete all documents in the target collection with Document key field values matching any of the input values.

The _id field is the default unique primary key for Cosmos DB collections, and is the usual choice. Choose other key fields if you need more flexibility (for example, deleting multiple documents per key, such as "all documents in this ZIP code"). You may also need to choose another key if _id is not the natural key available in the source data.

Reducing duplicate key fields by using an upstream Unique tool may improve performance if there are a large number of duplicate keys.

Options

Parameter

Description

Enable trigger output

See Trigger input and output.

Block size

Controls how many delete operations are sent to the Cosmos DB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory.

Field conversion options

See Document database field conversion options.

Configure the Cosmos DB Deleter tool

Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.

To configure the Cosmos DB Deleter tool:

  1. Select the Cosmos DB Deleter tool.

  2. Go to the Configuration tab on the Properties pane.

  3. Optionally, override shared settings:

    1. Open the Connection settings section.

    2. Select Override.

    3. Specify new values for the tool.

  4. Select a Database and Collection.

  5. Select an Input key field.

  6. Enter a Document key field. The tool will delete all documents in the target collection whose Document key field matches any of the input values.

  7. Select the Options tab to Enable Trigger output or set Block size.

  8. Optionally, open the Field conversion section on the Options tab and configure Document database field conversion options. If the conversion options do not match the target database, the key values will not match.

  9. Optionally, go to the Execution tab, and then set Report options.

The _id field is the natural unique primary key for Cosmos DB collections, and is the usual choice. However, choosing other key fields can support behavior like deleting multiple documents per key (for example, "all documents in this ZIP Code"). You may also need to choose another key if _id is not the natural key available in the source data.

If there are a great number of duplicate keys, you may improve performance by reducing duplicate key fields with an upstream Unique tool.

Cosmos DB Updater

The Cosmos DB Updater tool alters existing documents in a Cosmos DB collection by replacing values with those from a Document field in the input records. Each update field must be specified separately.

Cosmos DB Updater tool configuration parameters

The Cosmos DB Updater tool has two sets of configuration parameters in addition to the standard execution options:

Configuration

Parameter

Description

Override connection settings

See Cosmos DB tool shared settings.

Database

Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated.

Collection

Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated.

Input document field

Select the field from the upstream connection that contains documents containing update values.

Update key

Select the Update key, which is used to determine which records to update. The _id field is the default unique primary key for Cosmos DB collections, and is the usual choice. Choose a different update key if you need more flexibility (for example, updating multiple documents at once). The update key must exist in the input documents, or the project will abort with an error.

Update fields

Enter a list of fields that will be copied from the input source to target documents. Select Find field names to generate a list of field names discovered in existing documents.

Options

Parameter

Description

Include nulls in document

Select to add explicit null values to the documents.

Block size

Controls how many records are sent to the Cosmos DB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory.

Enable trigger output

See Trigger input and output.

Configure the Cosmos DB Updater tool

Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.

To configure the Cosmos DB Updater tool:

  1. Select the Cosmos DB Updater tool.

  2. Go to the Configuration tab on the Properties pane.

  3. Optionally, override shared settings:

    1. Open the Connection settings section.

    2. Select Override.

    3. Specify new values for the tool.

  4. Select a Database and Collection.

  5. Select an Input document field.

  6. Enter an Update key, used to determine which records to update. To update the entire document with a new document, use the Cosmos DB Output tool with Processing mode set to Upsert.

  7. Select one or more Update fields that will be copied from the source to target documents. You can select Find field names to populate the list of field names discovered in existing documents.

  8. Select the Options tab to Include nulls in document, Enable Trigger output, or set Block size.

  9. Optionally, go to the Execution tab, and then set Report options.

Cosmos DB Array Updater

The Cosmos DB Array Updater tool alters existing documents in a Cosmos DB collection by adding data to an array as an atomic document database operation (Cosmos DB syntax $push). Enable the Unique result option to add a value to the target array only if the value does not already exist in the array (Cosmos DB syntax $addToSet).

Cosmos DB Array Updater tool configuration parameters

The Cosmos DB Array Updater tool has two sets of configuration parameters in addition to the standard execution options.

Configuration

Parameter

Description

Override connection settings

See Cosmos DB tool shared settings.

Database

Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated.

Collection

Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated.

Input document field

Select the array field from the upstream connection that contains documents containing update values.

Update key

Select the update key, which is used to determine which records to update. The _id field is the default unique primary key for Cosmos DB collections, and is the usual choice. Choose a different update key if you need more flexibility (for example, updating multiple documents at once). The update key must exist in the input documents, or the project will abort with an error.

Target field

Enter one or more fields to update. Select Find field names to generate a list of field names discovered in existing documents.

Options

Parameter

Description

Unique result

Select to add a value to the target array only if the value does not already exist in the array (Cosmos DB syntax $addToSet). This ensures that there will be no duplicate items added by the update.

Block size

Controls how many records are sent to the Cosmos DB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory.

Enable trigger output

See Trigger input and output.

Configure the Cosmos DB Array Updater tool

Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.

To configure the Cosmos DB Array Updater tool:

  1. Select the Cosmos DB Array Updater tool.

  2. Go to the Configuration tab on the Properties pane.

  3. Optionally, override shared settings:

    1. Open the Connection settings section.

    2. Select Override.

    3. Specify new values for the tool.

  4. Select a Database and Collection.

  5. Select an Input document field.

  6. Enter an Update key, used to determine which records to update.

  7. Select one or more Target fields to update. You can select Find field names to populate the list of field names discovered in existing documents.

  8. Select the Options tab to Unique result, Enable Trigger output, or set Block size.

  9. Optionally, go to the Execution tab, and then set Report options.

Cosmos DB Key Query

The Cosmos DB Key Query tool returns all documents from the source collection whose key fields match any of the input key field values.

Cosmos DB Key Query tool configuration parameters

The Cosmos DB Key Query tool has two sets of configuration parameters in addition to the standard execution options.

Configuration

Parameter

Description

Override connection settings

See Cosmos DB tool shared settings.

Database

Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated.

Collection

Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated.

Input key field
Document key field

Select an Input key field and Document key field. Select Find field names to generate a list of field names discovered in existing documents. The tool will return all documents in the target collection whose Document key field value matches any of the Input key field values.

Options

Parameter

Description

Block size

Controls the number of keys that are sent to the Cosmos DB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory.

Field conversion options

See Document database field conversion options.

Configure the Cosmos DB Key Query tool

Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.

To configure the Cosmos DB Key Query tool:

  1. Select the MongoDB Key Query tool.

  2. Go to the Configuration tab on the Properties pane.

  3. Optionally, override shared settings:

    1. Open the Connection settings section.

    2. Select Override.

    3. Specify new values for the tool.

  4. Select a Database and Collection.

  5. Select an Input key field, and then choose a Document key field. You can select Find field names to generate a list of field names discovered in existing documents.

  6. Optionally, select the Options tab to set Block size.

  7. Optionally, open the Field conversion section on the Options tab and configure Document database field conversion options. If the conversion options do not match the target database, the key values will not match.

  8. Optionally, select the Sample tab, and then choose Refresh Sample data to view a sample of the input data.

  9. Optionally, go to the Execution tab, and then set Web service options.

The _id field is the natural unique primary key for Cosmos DB collections, and is the usual choice. However, choosing other key fields can support behavior like returning multiple documents per key (for example, "all documents in this ZIP Code"). You may also need to choose another key if _id is not the natural key available in the source data.

Cosmos DB Executor

The Cosmos DB Executor tool can perform the following commands on the configured database:

  • Drop collection

  • Clear collection

  • Drop index on collection

  • Create index on a collection, specifying field list, Unique, and Sparse options

Cosmos DB Key Query tool configuration parameters

The Cosmos DB Executor tool has two sets of configuration parameters in addition to the standard execution options:

Configuration

Parameter

Description

Override connection settings

See Cosmos DB tool shared settings.

Database

Select a database from the list of existing databases. You must have a valid Cosmos DB connection configured in order for this list to be populated.

Operation

Select one or more Collection actions:

  • CLEAR: remove all documents from the collection.

  • DROP: delete the collection and all associated configuration (indexes, validation, and options created outside Data Management).

Select one or more Index actions:

  • CREATE: create an index. No error is generated if the index already exists. However, if the index exists with an incompatible specification, the index will be dropped and re-created.

  • DROP: delete an index. No error is generated if the index does not exist.

Collection

Select a collection from the list of existing collections. You must have a valid Cosmos DB connection to an existing database in order for this list to be populated.

Index name

Name of the index to CREATE or DROP.

Due to a bug in Cosmos DB, the index names you define may be changed to default names on CREATE.

Sparse

Select to create a sparse index. This is good for seldom-used fields because it saves space.

Unique

Select to create a unique index. This enforces uniqueness across the specified fields. Note that the _id field always has a unique index.

An index that is both sparse and unique prevents collection from having documents with duplicate values for a field but allows multiple documents that omit the key.

Fields

Comma-separated list of field names on which to index. Indexes are ascending on all fields.

Enable shell commands
Enter shell commands

Select to enter database commands in JSON form, one per line. You can use this to perform operations that are unavailable in Collection actions and Index actions. See MongoDB Database Commands for details.

Options

Parameter

Description

Enable trigger input
Enable trigger output

See Trigger input and output.

Configure the Cosmos DB Executor tool

Before configuring a Cosmos DB tool, you should have a Cosmos DB connection defined in tool connection settings.

To configure the Cosmos DB Executor tool
  1. Select the Cosmos DB Executor tool.

  2. Go to the Configuration tab on the Properties pane.

  3. Optionally, override shared settings:

    1. Open the Connection settings section.

    2. Select Override.

    3. Specify new values for the tool.

  4. Select a Database.

  5. Use the Collection actions and/or Index actions grids to configure commands. You may add as many commands as you like.

To drop a collection

In the Collection actions grid:

  1. Select a grid cell under Operation and choose DROP.

  2. Choose the corresponding grid cell under Collection.

  3. Select the collection you want to drop.

To clear a collection

In the Collection actions grid:

  1. Select a grid cell under Operation and choose CLEAR.

  2. Choose the corresponding grid cell under Collection.

  3. Select the collection you want to drop.

To create an index
  1. In the Index actions grid:

    1. Select a grid cell under Operation and choose CREATE.

    2. Choose the corresponding grid cell under Collection.

    3. Select the collection you want to index.

  2. Select the corresponding grid cell under Index name and enter a name for the new index.

    • Due to a bug in Cosmos DB, the index names you define may be changed to default names on CREATE.

  3. Optionally, select Sparse and/or Unique to select those index types.

  4. Select the corresponding grid cell under Fields and enter one or more field names (separated by commas) to be indexed.

To drop an index
  1. In the Index actions grid:

    1. Select a grid cell under Operation and choose DROP.

    2. Choose the corresponding grid cell under Collection.

    3. Select the collection you want to drop.

  2. Select the corresponding grid cell under Index name and enter a name for the index you want to drop.

  3. Select the corresponding grid cell under Fields and enter one or more field names (separated by commas) whose indexes will be dropped.

  4. Optionally, select Enable shell commands and enter database commands in JSON form, one per line.

  5. Select the Options tab to Enable Trigger input or output.

  6. Optionally, go to the Execution tab, and then set Report options.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.