MongoDB tools
MongoDB is the most well-known NoSQL document database.
Data Management supports MongoDB versions 4.4, 5.x, and 6.x, including Atlas clusters.
MongoDB documents are based on BSON, which can be thought of as a binary JSON with added type support (including timestamp, decimal, and binary types). MongoDB supports unique indexes and indexing of array elements, and does not directly support joins.
Data Management's support for MongoDB includes:
MongoDB Input and MongoDB Output tools
MongoDB Deleter, MongoDB Updater, and MongoDB Array Updater tools
MongoDB Key Query and MongoDB Executor tools
MongoDB tool connection settings
Data Management's MongoDB tools use shared settings, which allows you to define a single set of configuration properties (typically access credentials) to share across multiple tools in your Data Management Site. You can override these settings on a per-tool basis by opening the Connection settings section on the tool's Properties pane, selecting Override, and specifying values for that specific tool.
To define MongoDB shared tool settings
Open the Tools folder under Settings in the repository.
Select the MongoDB tab, and then configure the tool properties for your environment:
Connection URI | The MongoDB URI connection string (as defined in the MongoDB documentation). |
Explicit username/password | If selected, you must specify an explicit Username and Password (or Key Vault reference). While the username and password can be embedded in the Connection URI string, it is often preferable for security reasons to specify these separately, since the password will be encrypted. |
To configure default shared tool settings from a MongoDB tool's Properties pane, open the Connection settings section, and select Edit default settings.
To override MongoDB tool shared settings
Select the desired MongoDB tool, and then go to the Configuration tab on the Properties pane.
Open the Connection settings section, select Override, and then specify new values for the tool.
MongoDB Input
The MongoDB input tool reads documents from a collection of a MongoDB database and sends those documents to its output connector. The tool makes no attempt to interpret the documents; instead the documents are stored in a field of type Document.
MongoDB Input tool configuration parameters
The MongoDB Input tool has two sets of configuration parameters in addition to the standard execution options:
Configuration
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid MongoDB connection configured in order for this list to be populated. |
Collection | Select a collection from the list of existing collections. You must have a valid MongoDB connection to an existing database in order for this list to be populated. |
Query style | Select the type of query to perform: Form or JSON. |
Query | If Query style is Form, use the Edit filter terms grid to construct a query: select Field, Operation, and Type, and specify Values. If Query style is JSON, enter a query in JSON form to filter the documents returned. The query string must correspond to MongoDB's specification. Some examples: Equality on zip_code field: Names starting with "Z": Salary greater than or equal to 10000: Select Find field names to generate a list of field names discovered in existing documents. If you do not enter a query, the tool will return all documents in the collection (unless a limit is specified on the Options tab). |
Options
Sort by | If Sort by is selected, the MongoDB server will sort documents on the specified Sort field before returning them. Select Find field names to generate a list of field names discovered in existing documents. Optionally, you may select Sort descending to sort values in descending order. Note that MongoDB will not sort more than 100MB of data. To query and sort a large amount of data, use Data Management's Sort tool downstream from the MongoDB Input tool. |
Limit records | If selected, enter the maximum number of records to be returned. For test runs on large databases, this can significantly reduce run-time. |
Enable trigger input | |
Only return selected fields | If selected, specify a list of field names to be returned. This will reduce the size and complexity of the returned documents and improve performance. Select Find field names to generate a list of field names discovered in existing documents. |
Configure the MongoDB Input tool
Before configuring a MongoDB tool, you should have a MongoDB connection defined in tool connection settings.
To configure the MongoDB Input tool:
Select the MongoDB Input tool, and then go to the Configuration tab on the Properties pane.
Optionally, override shared settings: open the Connection settings section, select Override, and then specify new values for the tool.
Select a Database and Collection.
Select Query style and select Form or JSON:
If Query style is Form, use the Edit filter terms grid to construct a query: select Field, Operation, and Type, and specify Values.
If Query style is JSON, enter a JSON query to filter the documents returned, or leave blank to return all documents in the Collection. You can select Find field names to generate a list of field names discovered in existing documents.
Select the Options tab to configure how document fields are returned or Enable Trigger input.
Optionally, select the Sample tab, and then select Refresh Sample data to view a sample of the input data.
Optionally, go to the Execution tab, and then set Web service options.
MongoDB Output
The MongoDB Output tools inserts new documents into a collection, or optionally updates documents that exist and inserts them otherwise (upsert behavior).
MongoDB Output tool configuration parameters
The MongoDB Output tool has two sets of configuration parameters in addition to the standard execution options:
Configuration
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid MongoDB connection configured in order for this list to be populated. |
Collection | Select a collection from the list of existing collections. You must have a valid MongoDB connection to an existing database in order for this list to be populated. |
Input document field | Select the field from the upstream connection that contains documents to insert. This field must exist and be of type Document. |
Processing mode | Select processing mode:
|
Target key field | Key field for documents in the target Database and Collection. In MongoDB, the default primary key field is |
Options
Enable trigger output | |
Block size | Controls how many records are sent to the MongoDB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory. |
Configure the MongoDB Output tool
Before configuring a MongoDB tool, you should have a MongoDB connection defined in tool connection settings.
To configure the MongoDB Output tool:
Select the MongoDB Output tool, and then go to the Configuration tab on the Properties pane.
Optionally, override shared settings: open the Connection settings section, select Override, and then specify new values for the tool.
Select a Database and Collection.
Select Processing mode and select either Create or Upsert:
Create inserts new documents into the collection. If a document with the same Target key field value already exists (or if there is another unique index violation), the operation fails and the project aborts.
Upsert attempts to insert new documents into the collection. If a document with the same Target key field value already exists (or if there is another unique index violation), the existing document is replaced by the new document. If Target key field is not the default
_id
, it should be indexed; otherwise the update operation will be inefficient.
Optionally, select Target key field and specify the key field for documents in the target Database and Collection. In MongoDB, the default primary key field is
_id
. If you do not explicitly set an_id
field in your documents, MongoDB will synthesize a new one for you.Select the Options tab to Enable Trigger output or set Block size.
Optionally, go to the Execution tab, and then set Report options.
MongoDB Deleter
The MongoDB Deleter tool deletes documents in the target collection that match keys read from the input connection.
MongoDB Deleter tool configuration parameters
The MongoDB Deleter tool has two sets of configuration parameters in addition to the standard execution options:
Configuration
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid MongoDB connection configured in order for this list to be populated. |
Collection | Select a collection from the list of existing collections. You must have a valid MongoDB connection to an existing database in order for this list to be populated. |
Input key field | Select an Input key field and a Document key field. Input key field values will be read from source documents. The tool will delete all documents in the target collection with Document key field values matching any of the input values. The Reducing duplicate key fields by using an upstream Unique tool may improve performance if there are a large number of duplicate keys. |
Options
Enable trigger output | |
Block size | Controls how many delete operations are sent to the MongoDB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory. |
Field conversion options |
Configure the MongoDB Deleter tool
Before configuring a MongoDB tool, you should have a MongoDB connection defined in tool connection settings.
To configure the MongoDB Deleter tool:
Select the MongoDB Deleter tool, and then go to the Configuration tab on the Properties pane.
Optionally, override shared settings: open the Connection settings section, select Override, and then specify new values for the tool.
Select a Database and Collection.
Select an Input key field, and then enter a Document key field. The tool will delete all documents in the target collection whose Document key field matches any of the input values.
Select the Options tab to Enable Trigger output or set Block size.
Optionally, open the Field conversion section on the Options tab and configure Document database field conversion options. If the conversion options do not match the target database, the key values will not match.
Optionally, go to the Execution tab, and then set Report options.
The _id
field is the natural unique primary key for MongoDB collections, and is the usual choice. However, choosing other key fields can support behavior like deleting multiple documents per key (for example, "all documents in this ZIP Code"). You may also need to choose another key if _id
is not the natural key available in the source data.
If there are a great number of duplicate keys, you may improve performance by reducing duplicate key fields with an upstream Unique tool.
MongoDB Updater
The MongoDB Updater tool alters existing documents in a MongoDB collection by replacing values with those from a Document field in the input records. Each update field must be specified separately.
MongoDB Updater tool configuration parameters
The MongoDB Updater tool has two sets of configuration parameters in addition to the standard execution options:
Configuration
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid MongoDB connection configured in order for this list to be populated. |
Collection | Select a collection from the list of existing collections. You must have a valid MongoDB connection to an existing database in order for this list to be populated. |
Input document field | Select the field from the upstream connection that contains documents containing update values. |
Update key | Select the update key, which is used to determine which records to update. The |
Update fields | Enter a list of fields that will be copied from the input source to target documents. Select Find field names to generate a list of field names discovered in existing documents. |
Options
Include nulls in document | Select to add explicit null values to the documents. |
Block size | Controls how many records are sent to the MongoDB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory. |
Enable trigger output |
Configure the MongoDB Updater tool
Before configuring a MongoDB tool, you should have a MongoDB connection defined in tool connection settings.
To configure the MongoDB Updater tool:
Select the MongoDB Updater tool, and then go to the Configuration tab on the Properties pane.
Optionally, override shared settings: open the Connection settings section, select Override, and then specify new values for the tool.
Select a Database and Collection.
Select an Input document field, and then then enter an Update key, used to determine which records to update.
To update the entire document with a new document, use the MongoDB Output tool with Processing mode set to Upsert.
Select one or more Update fields that will be copied from the source to target documents. You can select Find field names to populate the list of field names discovered in existing documents.
Select the Options tab to Include nulls in document, Enable Trigger output or set Block size.
Optionally, go to the Execution tab, and then set Report options.
MongoDB Array Updater
The MongoDB Array Updater tool alters existing documents in a MongoDB collection by adding data to an array as an atomic document database operation (MongoDB syntax $push
). Enable the Unique result option to add a value to the target array only if the value does not already exist in the array (MongoDB syntax $addToSet
).
MongoDB Array Updater tool configuration parameters
The MongoDB Array Updater tool has two sets of configuration parameters in addition to the standard execution options:
Configuration
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid MongoDB connection configured in order for this list to be populated. |
Collection | Select a collection from the list of existing collections. You must have a valid MongoDB connection to an existing database in order for this list to be populated. |
Input document field | Select the array field from the upstream connection that contains documents containing update values. |
Update key | Select the update key, which is used to determine which records to update. The |
Target field | Enter one or more fields to update. Select Find field names to generate a list of field names discovered in existing documents. |
Options
Unique result | Select to add a value to the target array only if the value does not already exist in the array (MongoDB syntax |
Block size | Controls how many records are sent to the MongoDB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory. |
Enable trigger output |
Configure the MongoDB Array Updater tool
Before configuring a MongoDB tool, you should have a MongoDB connection defined in tool connection settings.
To configure the MongoDB Array Updater tool:
Select the MongoDB Array Updater tool, and then go to the Configuration tab on the Properties pane.
Optionally, override shared settings: open the Connection settings section, select Override, and then specify new values for the tool.
Select a Database and Collection.
Select an Input document field, and then then enter an Update key, used to determine which records to update.
Select one or more Target fields to update. You can select Find field names to populate the list of field names discovered in existing documents.
Select the Options tab to Unique result, Enable Trigger output, or set Block size.
Optionally, go to the Execution tab, and then set Report options.
MongoDB Key Query
The MongoDB Key Query tool returns all documents from the source collection whose key fields match any of the input key field values.
MongoDB Key Query tool configuration parameters
The MongoDB Key Query tool has two sets of configuration parameters in addition to the standard execution options:
Configuration
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid MongoDB connection configured in order for this list to be populated. |
Collection | Select a collection from the list of existing collections. You must have a valid MongoDB connection to an existing database in order for this list to be populated. |
Input key field | Select an Input key field and Document key field. Select Find field names to generate a list of field names discovered in existing documents. The tool will return all documents in the target collection whose Document key field value matches any of the Input key field values. |
Options
Block size | Controls the number of keys that are sent to the MongoDB server at one time. If the connection to the server has some latency, you may achieve increased performance by increasing this value. However, higher settings will use more memory. |
Field conversion options |
Configure the MongoDB Key Query tool
Before configuring a MongoDB tool, you should have a MongoDB connection defined in tool connection settings.
To configure the MongoDB Key Query tool:
Select the MongoDB Key Query tool, and then go to the Configuration tab on the Properties pane.
Optionally, override shared settings: open the Connection settings section, select Override, and then specify new values for the tool.
Select a Database and Collection.
Select an Input key field, and then select a Document key field. You can select Find field names to generate a list of field names discovered in existing documents.
Optionally, select the Options tab to set Block size.
Optionally, open the Field conversion section on the Options tab and configure Document database field conversion options. If the defined conversion options do not match the target database, the key values will not match.
Optionally, select the Sample tab, and then select Refresh Sample data to view a sample of the input data.
Optionally, go to the Execution tab, and then set Report options and Web service options.
The _id
field is the natural unique primary key for MongoDB collections, and is the usual choice. However, choosing other key fields can support behavior like returning multiple documents per key (for example, "all documents in this ZIP Code"). You may also need to choose another key if _id
is not the natural key available in the source data.
MongoDB Executor
The MongoDB Executor tool can perform the following commands on the configured database:
Drop collection
Clear collection
Drop index on collection
Create index on a collection, specifying field list, Unique, and Sparse options
MongoDB Key Query tool configuration parameters
The MongoDB Executor tool has two sets of configuration parameters in addition to the standard execution options:
Configuration
Override connection settings | |
Database | Select a database from the list of existing databases. You must have a valid MongoDB connection configured in order for this list to be populated. |
Operation | Select one or more Collection actions:
Select one or more Index actions:
|
Collection | Select a collection from the list of existing collections. You must have a valid MongoDB connection to an existing database in order for this list to be populated. |
Index name | Name of the index to CREATE or DROP. |
Sparse | Select to create a sparse index. This is good for seldom-used fields because it saves space. |
Unique | Select to create a unique index. This enforces uniqueness across the specified fields. Note that the An index that is both sparse and unique prevents collection from having documents with duplicate values for a field but allows multiple documents that omit the key. |
Fields | Comma-separated list of field names on which to index. Indexes are ascending on all fields. |
Enable shell commands | Select to enter database commands in JSON form, one per line. You can use this to perform operations that are unavailable in Collection actions and Index actions. See MongoDB Database Commands for details. |
Options
Enable trigger input |
Configure the MongoDB Executor tool
Before configuring a MongoDB tool, you should have a MongoDB connection defined in tool connection settings.
To configure the MongoDB Executor tool
Select the MongoDB Executor tool, and then go to the Configuration tab on the Properties pane.
Optionally, override shared settings: open the Connection settings section, select Override, and then specify new values for the tool.
Select a Database.
Use the Collection actions and/or Index actions grids to configure commands. You may add as many commands as you like.
To drop a collection
In the Collection actions grid: select a grid cell under Operation and select DROP, and then select the corresponding grid cell under Collection and select the collection you want to drop.
To clear a collection
In the Collection actions grid: select a grid cell under Operation and select CLEAR, and then select the corresponding grid cell under Collection and select the collection you want to clear.
To create an index
In the Index actions grid: select a grid cell under Operation and select CREATE, and then select the corresponding grid cell under Collection and select the collection you want to index.
Select the corresponding grid cell under Index name and enter a name for the new index.
Optionally, select Sparse and/or Unique to select those index types.
Select the corresponding grid cell under Fields and enter one or more field names (separated by commas) to be indexed.
To drop an index
In the Index actions grid: select a grid cell under Operation and select DROP, and then select the corresponding grid cell under Collection and select the collection you want to drop.
Select the corresponding grid cell under Index name and enter a name for the index you want to drop.
Select the corresponding grid cell under Fields and enter one or more field names (separated by commas) whose indexes will be dropped.
Optionally, select Enable shell commands and enter database commands in JSON form, one per line.
Select the Options tab to Enable Trigger input or output.
Optionally, go to the Execution tab, and then set Report options.