RDBMS database providers
Overview
The Database selector is located on the Source tab of the RDBMS Input, RDBMS Output, RDBMS Execute, and Data Connection Properties pane.
Use this control to select an RDBMS provider, which consists of:
A database type (such as Oracle, SQL Server, or DB2).
A connection method (JDBC preferred; OleDB, ODBC, and Oracle OLI are deprecated and will not be supported in future Data Management releases).
A set of read-only parameters that further refine the database strategy.
A set of options appropriate for the database strategy. Default options are displayed on the Options tab but can be overridden.
A set of parameters to be defined in that typically specify the database server, database name, and initial catalog.
Database providers simplify the task of connecting to different databases by predefining parameters, selecting appropriate options, and setting up a fill-in-the-blank grid for defining necessary parameters. Data Management ships with a selection of database providers organized by database type.
RDBMS providers: Access
Connects to Microsoft Access databases. These are databases contained in a single file. Note that Microsoft Jet OLE DB.4.0 is not supported for the 64 bit OS.
Access, standard security (oledb). This is the simplest and most common provider.
Access, password security (oledb). Only use this for Access databases that have a password protection.
Data Management version 10 will end support for reading and writing to Microsoft Access via RDBMS tools, and the Access database providers are deprecated as of version 9.3.1. Support for Microsoft Access will end after version 9.4.1.
RDBMS providers: AWS
Data Management's RDBMS tools can connect to AWS (Amazon Web Services) Aurora databases and Redshift data warehouse services.
Database name | Database API | Reconnect option available | Notes |
---|---|---|---|
Aurora MySQL | JDBC | MySQL 5.7. Can be used with AWS Identity and Access Management (IAM) by configuring these options:
| |
Aurora PostgreSQL | JDBC | N/A | |
Aurora PostgreSQL Bulk loader | JDBC | N/A | |
AWS Redshift | ODBC native | The AWS Redshift JDBC and ODBC drivers are adequate for inserting small amounts of data into tables with fewer than 50 columns, but they are slow. | |
AWS Redshift | JDBC | Defaults to operating in SSL mode, which may cause connection failures on systems not configured for SSL. If this happens, do one of the following:
| |
AWS Redshift Bulk | JDBC | Use with RDBMS Output tool for bulk inserts of an input record stream into a Amazon Redshift Database table. If you specify this provider, you may optionally configure additional properties. |
ODBC providers have been deprecated in Data Management and ODBC drivers may not be supported in future versions.
RDBMS providers: Azure
The following table summarizes the database providers that can connect to Azure database servers and describes their variants.
Many providers offer both "allow reconnect" and "no reconnect" variants. Select the "allow reconnect" option to enable database reconnect for long-running projects that may be susceptible to database disconnects due to cloud-based environmental factors or other issues.
"Allow reconnect" strategies will cause the RDBMS Input tool to spool records until the data set is complete and then move the entire data set at once, rather than streaming records as soon as they are available.
Database name | Database API | Reconnect option available |
---|---|---|
Azure Data Warehouse Bulk | JDBC | No. If you specify this provider, you may optionally configure additional properties |
Azure Data Warehouse | JDBC | No |
Azure Data Warehouse | JDBC | Yes |
Azure Data Warehouse | ODBC | No |
Azure Data Warehouse | OLE DB Auth, Trusted | Reconnect-only. Requires MSOLEDBSQL Driver 18 or higher |
Azure | JDBC | Yes |
Azure | Native ODBC | Yes |
Azure SQL MI | OLE DB Auth, Trusted | Reconnect-only. Requires MSOLEDBSQL Driver 18 or higher |
Azure SQL | OLE DB Auth, Trusted | Reconnect-only. Requires MSOLEDBSQL Driver 18 or higher |
ODBC and OLE DB providers have been deprecated in Data Management, and the ODBC and OLE DB drivers may not be supported in future versions.
Azure Data Warehouse OLE DB connections require the MSOLEDBSQL driver introduced by Microsoft in April 2018. This driver replaces the SQLNCI, SQLOLEDB, and SNAC drivers, which will not work with our OLE DB data provider.
Azure Data Warehouse JDBC connections require the Microsoft JDBC Driver 8.2 for SQL Server introduced by Microsoft in March 2020.
RDBMS providers: DB2
Connects to IBM DB2 databases (version 10 or above).
DB2, JDBC, IBM provider, TCP/IP connection, (jdbc). See information below.
DB2, IBM provider, TCP/IP connection, ODBC native (odbc native).
ODBC providers have been deprecated in Data Management and this driver may not be supported in future versions.
When configuring DB2 via JDBC, you must issue GRANT SELECT ON SYSCAT.DBAUTH
to the User ID (Uid) used by the Data Management database connection. This User ID must also have SELECT
privileges for the following tables:
SYSIBM.SQLTABLES
SYSIBM.SYSTABLES
SYSIBM.SQLTABLETYPES
SYSIBM.SQLTABLEPRIVILEGES
SYSIBM.SYSCOLUMNS
SYSIBM.SQLSPECIALCOLUMNS
SYSIBM.SQLCOLPRIVILEGES
SYSIBM.SYSDUMMY1
SYSIBM.SQLPRIMARYKEYS
SYSIBM.SQLSTATISTICS
SYSIBM.FOREIGNKEYS
SYSIBM.SYSINDEXES
SYSIBM.SQLPROCEDURES
SYSIBM.SQLPROCEDURECOLS
SYSIBM.SYSROUTINES
SYSIBM.SYSROUTINEPARMS
SYSIBM.SQLTYPEINFO
SYSIBM.SQLUDTS
SYSIBM.SQLSCHEMAS
SYSIBM.SYSRELS
SYSIBM.SYSKEYCOLUSE
SYSIBM.SYSTABCONST
SYSIBM.SYSDATATYPES
RDBMS providers: Excel
Connects to Microsoft Excel files. You must have Access Database Engine drivers installed.
Excel ACE OLEDB (oledb): connects to Excel using a simple file selection.
Excel using Excel Files DSN (odbc via oledb): connects to Excel using a defined ODBC DSN. Not recommended, but may be required by some organizational policies.
Data Management version 10 will end support for reading and writing to Microsoft Excel via RDBMS tools, and the Excel database providers are deprecated as of version 9.3.1. Version 9.4.1 will introduce a tool that reads Excel workbooks.
RDBMS providers: Generic
Connects to arbitrary OLE DB or ODBC databases.
Generic OLEDB (oledb): use this for OLE DB providers that are not explicitly supported by Data Management. Requires selection of Provider name and manual entry of connection parameters.
ODBC Native Generic (odbc_native): useful when you have a driver that is not supported by any other database provider. Available ODBC drivers are displayed on a list. When you select a driver, Data Management will attempt to query the driver for connection parameters. Only some drivers support this functionality.
ODBC Native DSN (odbc_native): useful when you have a driver that is not supported by any other database provider. Requires selection of DSN and manual entry of connection parameters.
ODBC and OLE DB providers have been deprecated in Data Management, and the ODBC and OLE DB drivers may not be supported in future versions.
RDBMS providers: Google BigQuery
Connects to Google BigQuery databases.
Google BigQuery Bulk Load - Application Default Credentials (gbq)
Google BigQuery Bulk Load - Service Account Auth (gbq)
BigQuery does not support the INTO
SQL syntax. If you have an SQL query like the following:SELECT * INTO <tablename>
you must rewrite it like this:CREATE <tablename> AS SELECT *
BigQuery and the RDBMS Execute tool
In cases where a result set is expected by a downstream tool, note that BigQuery providers will not emit an error if the query defined in the RDBMS Execute tool returns no results.
BigQuery is slow when temporary tables aren't used for update and delete.
An RDBMS Output tool performing update or delete operations on a Google BigQuery target will run very slowly unless it is configured to use temp tables:
BigQuery data types
The following table describes how Data Management translates between its native data types and Google BigQuery data types. Note that not every data type has a Data Management equivalent.
Data Management data type | BigQuery data type | Notes |
---|---|---|
N/A |
| Data Management provides no analogous type, and will fail with an error |
|
| N/A |
|
| N/A |
|
| N/A |
|
| N/A |
|
|
|
|
| N/A |
|
| N/A |
|
| BigQuery provides no analogous type. Data Management will coerce this type to a |
N/A |
| Data Management provides no analogous type, and will fail with an error |
|
| N/A |
|
| Only reads are supported |
|
| N/A |
RDBMS providers: Greenplum
Connects to Greenplum database servers.
Greenplum (PostgreSQL, JDBCx64) (jdbc): standard connection method.
Greenplum (PostgreSQL, ODBCx64) (odbc native): standard connection method. Requires installation of ODBC drivers available from Greenplum web site.
Greenplum Gpfdist Loader (PostgreSQL, ODBCx64) (odbc native)
ODBC providers have been deprecated in Data Management, and the ODBC drivers may not be supported in future versions.
RDBMS providers: MySQL
Connects to MySQL database servers.
MySQL, JDBC, MySQL 8.0. Standard security (jdbc): specifies that SSL is not to be used. If you want to use SSL in conjunction with MySQL 8.0, configure the data connection, changing the values of parameters
useSSL
(and possiblyverifyServerCertificate
andrequireSSL
) toTrue
.MySQL, ODBC, MySQL 8.0, Standard security (odbc_native): connects to MySQL using native ODBC driver.
MySQL, JDBC, MySQL 5.7. Standard security (jdbc): does not require ODBC drivers, but is much slower than the ODBC interface. Specifies that SSL is not to be used. If you want to use SSL in conjunction with MySQL 5.7, configure the data connection, changing the values of parameters
useSSL
(and possiblyverifyServerCertificate
andrequireSSL
) toTrue
.MySQL, ODBC, MySQL 5.7, Standard security (odbc_native): connects to MySQL using native ODBC driver.
ODBC providers have been deprecated in Data Management, and the ODBC drivers may not be supported in future versions.
RDBMS providers: Netezza
Connects to Netezza data warehouse appliances.
Netezza, ODBC driver (odbc native): standard connection method. Requires installation of ODBC drivers available from Netezza.
Netezza 7.2, JDBC driver (jdbc native): standard connection method. Requires installation of JDBC drivers available from Netezza. This driver supports SSL for encrypting communication between the Data Management client and Netezza hosts.
ODBC providers have been deprecated in Data Management, and the ODBC drivers may not be supported in future versions.
To enable encrypted connections with the Netezza JDBC driver:
Copy the Netezza server’s CA certificate file (typically
server-cert.pem
) to a location accessible to all Data Management Execution Servers.Configure the Netezza JDBC connection with a Security level of
onlySecured
, and specify the file you copied in step 2 as the Server CA certificate file.
RDBMS providers: Oracle
Connects to Oracle 19c database servers.
Oracle, JDBC, Thin client (jdbc). The Oracle "thin" JDBC 4 interface communicates with the Oracle server directly.
When using JDBC to execute scripts, Oracle's JDBC driver usually requires that statements are not terminated with a semicolon, and that long commands (such as a CREATE TABLE script with many column descriptions) contain no internal newlines. This applies to preload and postload scripts in the RDBMS Output tool, as well as scripts in the RDBMS Execute and RDBMS In-Database Processor tools. Data Management enforces these rules programmatically.
However, there are cases where Oracle requires embedded newlines and a semicolon after the final command (especially if this command is END;
).
If you are using Oracle's JDBC driver and must use an embedded newline, precede it with a single backslash, the standard escape character (\
). Similarly, if you must use a semicolon at the end of a command, precede it with a single backslash (\
), or double the semicolon (;;
).
RDBMS providers: PostgreSQL
Connects to PostgreSQL database servers.
PostgreSQL, JDBC (jdbc): this driver is slower than the ODBC driver. It requires the JDBC driver JARr for PostgreSQL, available from the PostgreSQL web site.
PostgreSQL, JDBC, Bulk-loader (jdbc): use this driver for high-volume inserts into the PostgreSQL database; it is faster at inserting rows than the ODBC driver. However, on all other types of operations it has the same characteristics as the JDBC driver listed above. This driver requires the JDBC driver JAR for PostgreSQL, available from the PostgreSQL web site.
PostgreSQL, ODBC Native x64 (odbc native): this method can be used to access either Postgres or Greenplum databases, although load speeds will be slower on Greenplum than the Gpfdist driver. Requires installation of ODBC drivers available from PostgreSQL web site.
The default settings for PostgreSQL ODBC Native x64 specifyTextAsLongVarchar=yes
andB1=8190
. This means that if aCREATE TABLE
statement is issued using this data connection, any columns of typeTEXT
will be re-interpreted asVARCHAR(8190)
, and values written to those columns will be truncated to8190
characters if necessary. Since theTEXT
type normally has no meaningful limit on the number of characters, this behavior may be undesirable. To turn off this option or change the size limit, edit the values ofTextAsLongVarchar
andB1
in a data connection. Note that the JDBC provider does not process theTextAsLongVarchar
or theB1
parameters, so it will not exhibit this behavior.PostgreSQL, ODBC Native (odbc native): standard connection method. Requires installation of ODBC drivers available from PostgreSQL web site.
ODBC providers have been deprecated in Data Management, and the ODBC drivers may not be supported in future versions.
Internal deadlocks occur if multiple PostgreSQL bulk loaders try to run within a single project. Therefore, a project can only include one PostgreSQL bulk loader tool. If you need to run multiple bulk loads as a unit, you must use automation to connect several individual loads.
RDBMS providers: SAP HANA
Connects to a SAP HANA database.
SAP HANA JDBC (jdbc): connects to SAP HANA using JDBC driver.
RDBMS providers: Snowflake
Connects to a Snowflake database.
You must download the latest Snowflake JDBC driver (version 3.14.1 or above) from the Maven Central Repository and install it.
You also you need a Snowflake account and credentials.
Database name | Database API | Reconnect option available |
---|---|---|
Snowflake | JDBC | No. |
Snowflake Bulk | JDBC | No. If you specify this provider, you may optionally configure additional properties. |
Snowflake authentication
Data Management supports the following Snowflake authentication methods:
Authentication method | Description |
---|---|
Snowflake internal authenticator | The default authentication method. |
Single Sign On (SSO) and Federated Authentication via Okta | To use this functionality:
|
Service user | Data Management v9.5.3 introduces support for Snowflake When Service user is selected, the Private key attribute must be set to the private portion of the service user’s RSA key-pair. Private key is only applicable for the Service user authentication method. This authentication method does not use the Password attribute. Refer to Snowflake Documentation: Create User for more information. |
RDBMS providers: SQL Server
This section summarizes the database providers that can connect to Microsoft SQL Server database servers and describes their variants.
Data Management v9.5.3 introduces the following connection attributes (refer to the corresponding properties in Setting the connection properties for more information).
Connection attribute | Description | Default value |
---|---|---|
Encrypt | Set to |
|
Trust server certificate | If Set to |
|
Host name in certificate | The host name to be used to validate the SQL Server TLS/SSL certificate. | null |
Many providers offer authentication variants:
Trusted connections use implicit operating system security based on the login used by the Data Management services.
SQL Auth or Standard Security both refer to SQL Server Authentication.
Many providers offer both "allow reconnect" and "no reconnect" variants. Select the "allow reconnect" option to enable database reconnect for long-running projects that may be susceptible to database disconnects due to cloud-based environmental factors or other issues.
"Allow reconnect" strategies will cause the RDBMS Input tool to spool records until the data set is complete and then move the entire data set at once, rather than streaming records as soon as they are available.
Database name | Database API | Authentication options available | Reconnect option available | Notes |
---|---|---|---|---|
SQL Server | JDBC | SQL Auth, Trusted | Yes | Requires |
ODBC and OLE DB providers have been deprecated in Data Management, and the ODBC/OLE DB drivers may not be supported in future versions.
RDBMS providers: Sybase
Connects to Sybase database servers.
Sybase, ASE provider, TC/IP connection, ODBC native (odbc native): connects to Sybase using ODBC driver.
ODBC providers have been deprecated in Data Management, and this ODBC driver may not be supported in future versions.
Sybase, JDBC (jdbc): connects to Sybase using JDBC driver.
RDBMS providers: Teradata
Connects to Teradata 16.20 and 17.10 database servers.
Teradata, ODBC Native (odbc native): connects to Teradata using ODBC driver.
ODBC providers have been deprecated in Data Management, and this ODBC driver may not be supported in future versions.
Teradata, JDBC (jdbc): connects to Teradata using JDBC driver. The JDBC strategy uses the Java
DatabaseMetaData
class, which requiresSELECT
access to theDBC.UDTInfo
table.
RDBMS providers: Vertica
Connects to Vertica databases.
Vertica Native ODBC (odbc_native): connects to Vertica using ODBC driver.
ODBC providers have been deprecated in Data Management, and this ODBC driver may not be supported in future versions.
Vertica JDBC (jdbc)