Skip to main content
Skip table of contents

RDBMS database providers

Overview

The Database selector is located on the Source tab of the RDBMS Input, RDBMS Output, RDBMS Execute, and Data Connection Properties pane.

image-20240327-151923.png

Use this control to select an RDBMS provider, which consists of:

  • A database type (such as Oracle, SQL Server, or DB2).

  • A connection method (JDBC preferred; OleDB, ODBC, and Oracle OLI are deprecated and will not be supported in future Data Management releases).

  • A set of read-only parameters that further refine the database strategy.

  • A set of options appropriate for the database strategy. Default options are displayed on the Options tab but can be overridden.

  • A set of parameters to be defined in that typically specify the database server, database name, and initial catalog.

Database providers simplify the task of connecting to different databases by predefining parameters, selecting appropriate options, and setting up a fill-in-the-blank grid for defining necessary parameters. Data Management ships with a selection of database providers organized by database type.

RDBMS providers: Access

Connects to Microsoft Access databases. These are databases contained in a single file. Note that Microsoft Jet OLE DB.4.0 is not supported for the 64 bit OS.

  • Access, standard security (oledb). This is the simplest and most common provider.

  • Access, password security (oledb). Only use this for Access databases that have a password protection.

Data Management version 10 will end support for reading and writing to Microsoft Access via RDBMS tools, and the Access database providers are deprecated as of version 9.3.1. Support for Microsoft Access will end after version 9.4.1.

RDBMS providers: AWS

Data Management's RDBMS tools can connect to AWS (Amazon Web Services) Aurora databases and Redshift data warehouse services.

Database name

Database API

Reconnect option available

Notes

Aurora MySQL

JDBC

(tick)

MySQL 5.7. Can be used with AWS Identity and Access Management (IAM) by configuring these options:

  • UseAwsIAM: if true, indicates that AWS IAM tokens should be used for this connection.

    • Default is false.

  • AwsRegion: AWS Region name.

  • AwsAccessKeyId: AWS Access Key for IAM user.

  • AwsSecretKey: AWS Secret Key for IAM user.

Aurora PostgreSQL

JDBC

(tick)

N/A

Aurora PostgreSQL Bulk loader

JDBC

(tick)

N/A

AWS Redshift

ODBC native

(minus)

The AWS Redshift JDBC and ODBC drivers are adequate for inserting small amounts of data into tables with fewer than 50 columns, but they are slow.

AWS Redshift

JDBC

(minus)

Defaults to operating in SSL mode, which may cause connection failures on systems not configured for SSL. If this happens, do one of the following:

  • Disable SSL by adding a parameter named ssl to your data connection, and setting its value to false.

  • Leave SSL enabled, but disable validation by adding a parameter named sslfactory to your data connection, and setting its value to org.postgresql.ssl.NonValidatingFactory.

AWS Redshift Bulk

JDBC

(tick)

Use with RDBMS Output tool for bulk inserts of an input record stream into a Amazon Redshift Database table. If you specify this provider, you may optionally configure additional properties.

ODBC providers have been deprecated in Data Management and ODBC drivers may not be supported in future versions.

RDBMS providers: Azure

The following table summarizes the database providers that can connect to Azure database servers and describes their variants.

Many providers offer both "allow reconnect" and "no reconnect" variants. Select the "allow reconnect" option to enable database reconnect for long-running projects that may be susceptible to database disconnects due to cloud-based environmental factors or other issues.

"Allow reconnect" strategies will cause the RDBMS Input tool to spool records until the data set is complete and then move the entire data set at once, rather than streaming records as soon as they are available.

Database name

Database API

Reconnect option available

Azure Data Warehouse Bulk

JDBC

No. If you specify this provider, you may optionally configure additional properties

Azure Data Warehouse

JDBC

No

Azure Data Warehouse

JDBC

Yes

Azure Data Warehouse

ODBC

No

Azure Data Warehouse

OLE DB Auth, Trusted

Reconnect-only. Requires MSOLEDBSQL Driver 18 or higher

Azure

JDBC

Yes

Azure

Native ODBC

Yes

Azure SQL MI

OLE DB Auth, Trusted

Reconnect-only. Requires MSOLEDBSQL Driver 18 or higher

Azure SQL

OLE DB Auth, Trusted

Reconnect-only. Requires MSOLEDBSQL Driver 18 or higher

ODBC and OLE DB providers have been deprecated in Data Management, and the ODBC and OLE DB drivers may not be supported in future versions.

Azure Data Warehouse OLE DB connections require the MSOLEDBSQL driver introduced by Microsoft in April 2018. This driver replaces the SQLNCI, SQLOLEDB, and SNAC drivers, which will not work with our OLE DB data provider.

Azure Data Warehouse JDBC connections require the Microsoft JDBC Driver 8.2 for SQL Server introduced by Microsoft in March 2020.

RDBMS providers: DB2

Connects to IBM DB2 databases (version 10 or above).

  • DB2, JDBC, IBM provider, TCP/IP connection, (jdbc). See information below.

  • DB2, IBM provider, TCP/IP connection, ODBC native (odbc native).
    ODBC providers have been deprecated in Data Management and this driver may not be supported in future versions.

When configuring DB2 via JDBC, you must issue GRANT SELECT ON SYSCAT.DBAUTH to the User ID (Uid) used by the Data Management database connection. This User ID must also have SELECT privileges for the following tables:

  • SYSIBM.SQLTABLES

  • SYSIBM.SYSTABLES

  • SYSIBM.SQLTABLETYPES

  • SYSIBM.SQLTABLEPRIVILEGES

  • SYSIBM.SYSCOLUMNS

  • SYSIBM.SQLSPECIALCOLUMNS

  • SYSIBM.SQLCOLPRIVILEGES

  • SYSIBM.SYSDUMMY1

  • SYSIBM.SQLPRIMARYKEYS

  • SYSIBM.SQLSTATISTICS

  • SYSIBM.FOREIGNKEYS

  • SYSIBM.SYSINDEXES

  • SYSIBM.SQLPROCEDURES

  • SYSIBM.SQLPROCEDURECOLS

  • SYSIBM.SYSROUTINES

  • SYSIBM.SYSROUTINEPARMS

  • SYSIBM.SQLTYPEINFO

  • SYSIBM.SQLUDTS

  • SYSIBM.SQLSCHEMAS

  • SYSIBM.SYSRELS

  • SYSIBM.SYSKEYCOLUSE

  • SYSIBM.SYSTABCONST

  • SYSIBM.SYSDATATYPES

RDBMS providers: Excel

Connects to Microsoft Excel files. You must have Access Database Engine drivers installed.

  • Excel ACE OLEDB (oledb): connects to Excel using a simple file selection.

  • Excel using Excel Files DSN (odbc via oledb): connects to Excel using a defined ODBC DSN. Not recommended, but may be required by some organizational policies.

Data Management version 10 will end support for reading and writing to Microsoft Excel via RDBMS tools, and the Excel database providers are deprecated as of version 9.3.1. Version 9.4.1 will introduce a tool that reads Excel workbooks.

RDBMS providers: Generic

Connects to arbitrary OLE DB or ODBC databases.

  • Generic OLEDB (oledb): use this for OLE DB providers that are not explicitly supported by Data Management. Requires selection of Provider name and manual entry of connection parameters.

  • ODBC Native Generic (odbc_native): useful when you have a driver that is not supported by any other database provider. Available ODBC drivers are displayed on a list. When you select a driver, Data Management will attempt to query the driver for connection parameters. Only some drivers support this functionality.

  • ODBC Native DSN (odbc_native): useful when you have a driver that is not supported by any other database provider. Requires selection of DSN and manual entry of connection parameters.

ODBC and OLE DB providers have been deprecated in Data Management, and the ODBC and OLE DB drivers may not be supported in future versions.

RDBMS providers: Google BigQuery

Connects to Google BigQuery databases.

  • Google BigQuery Bulk Load - Application Default Credentials (gbq)

  • Google BigQuery Bulk Load - Service Account Auth (gbq)

BigQuery does not support the INTO SQL syntax. If you have an SQL query like the following:
SELECT * INTO <tablename>
you must rewrite it like this:
CREATE <tablename> AS SELECT *

BigQuery and the RDBMS Execute tool

In cases where a result set is expected by a downstream tool, note that BigQuery providers will not emit an error if the query defined in the RDBMS Execute tool returns no results.

BigQuery is slow when temporary tables aren't used for update and delete.

An RDBMS Output tool performing update or delete operations on a Google BigQuery target will run very slowly unless it is configured to use temp tables:

image-20240327-153411.png

BigQuery data types

The following table describes how Data Management translates between its native data types and Google BigQuery data types. Note that not every data type has a Data Management equivalent.

Data Management data type

BigQuery data type

Notes

N/A

ARRAY

Data Management provides no analogous type, and will fail with an error

Binary

BYTES

N/A

Boolean

BOOL

N/A

Date

DATE

N/A

DateTime

DATETIME

N/A

Decimal

NUMERIC or BIGNUMERIC

  • Decimals with precision less than or equal to 38 map to NUMERIC

  • Decimals with precision greater than 38 map to BIGNUMERIC

Float

FLOAT64

N/A

Integer

INT64

N/A

Spatial

STRING

BigQuery provides no analogous type. Data Management will coerce this type to a STRING

N/A

STRUCT

Data Management provides no analogous type, and will fail with an error

Text

STRING

N/A

DateTime

TIMESTAMP

Only reads are supported

Time

TIME

N/A

RDBMS providers: Greenplum

Connects to Greenplum database servers.

  • Greenplum (PostgreSQL, JDBCx64) (jdbc): standard connection method.

  • Greenplum (PostgreSQL, ODBCx64) (odbc native): standard connection method. Requires installation of ODBC drivers available from Greenplum web site.

  • Greenplum Gpfdist Loader (PostgreSQL, ODBCx64) (odbc native)

ODBC providers have been deprecated in Data Management, and the ODBC drivers may not be supported in future versions.

RDBMS providers: MySQL

Connects to MySQL database servers.

  • MySQL, JDBC, MySQL 8.0. Standard security (jdbc): specifies that SSL is not to be used. If you want to use SSL in conjunction with MySQL 8.0, configure the data connection, changing the values of parameters useSSL (and possibly verifyServerCertificate and requireSSL) to True.

  • MySQL, ODBC, MySQL 8.0, Standard security (odbc_native): connects to MySQL using native ODBC driver.

  • MySQL, JDBC, MySQL 5.7. Standard security (jdbc): does not require ODBC drivers, but is much slower than the ODBC interface. Specifies that SSL is not to be used. If you want to use SSL in conjunction with MySQL 5.7, configure the data connection, changing the values of parameters useSSL (and possibly verifyServerCertificate and requireSSL) to True.

  • MySQL, ODBC, MySQL 5.7, Standard security (odbc_native): connects to MySQL using native ODBC driver.

ODBC providers have been deprecated in Data Management, and the ODBC drivers may not be supported in future versions.

RDBMS providers: Netezza

Connects to Netezza data warehouse appliances.

  • Netezza, ODBC driver (odbc native): standard connection method. Requires installation of ODBC drivers available from Netezza.

  • Netezza 7.2, JDBC driver (jdbc native): standard connection method. Requires installation of JDBC drivers available from Netezza. This driver supports SSL for encrypting communication between the Data Management client and Netezza hosts.

ODBC providers have been deprecated in Data Management, and the ODBC drivers may not be supported in future versions.

To enable encrypted connections with the Netezza JDBC driver:

  1. Configure the SSL certificate on the Netezza system.

  2. Copy the Netezza server’s CA certificate file (typically server-cert.pem) to a location accessible to all Data Management Execution Servers.

  3. Configure the Netezza JDBC connection with a Security level of onlySecured, and specify the file you copied in step 2 as the Server CA certificate file.

RDBMS providers: Oracle

Connects to Oracle 19c database servers.

Oracle, JDBC, Thin client (jdbc). The Oracle "thin" JDBC 4 interface communicates with the Oracle server directly.

When using JDBC to execute scripts, Oracle's JDBC driver usually requires that statements are not terminated with a semicolon, and that long commands (such as a CREATE TABLE script with many column descriptions) contain no internal newlines. This applies to preload and postload scripts in the RDBMS Output tool, as well as scripts in the RDBMS Execute and RDBMS In-Database Processor tools. Data Management enforces these rules programmatically.

However, there are cases where Oracle requires embedded newlines and a semicolon after the final command (especially if this command is END;).

If you are using Oracle's JDBC driver and must use an embedded newline, precede it with a single backslash, the standard escape character (\ ). Similarly, if you must use a semicolon at the end of a command, precede it with a single backslash (\ ), or double the semicolon (;;).

RDBMS providers: PostgreSQL

Connects to PostgreSQL database servers.

  • PostgreSQL, JDBC (jdbc): this driver is slower than the ODBC driver. It requires the JDBC driver JARr for PostgreSQL, available from the PostgreSQL web site.

  • PostgreSQL, JDBC, Bulk-loader (jdbc): use this driver for high-volume inserts into the PostgreSQL database; it is faster at inserting rows than the ODBC driver. However, on all other types of operations it has the same characteristics as the JDBC driver listed above. This driver requires the JDBC driver JAR for PostgreSQL, available from the PostgreSQL web site.

  • PostgreSQL, ODBC Native x64 (odbc native): this method can be used to access either Postgres or Greenplum databases, although load speeds will be slower on Greenplum than the Gpfdist driver. Requires installation of ODBC drivers available from PostgreSQL web site.
    The default settings for PostgreSQL ODBC Native x64 specify TextAsLongVarchar=yes and B1=8190. This means that if a CREATE TABLE statement is issued using this data connection, any columns of type TEXT will be re-interpreted as VARCHAR(8190), and values written to those columns will be truncated to 8190 characters if necessary. Since the TEXT type normally has no meaningful limit on the number of characters, this behavior may be undesirable. To turn off this option or change the size limit, edit the values of TextAsLongVarchar and B1 in a data connection. Note that the JDBC provider does not process the TextAsLongVarchar or the B1 parameters, so it will not exhibit this behavior.

  • PostgreSQL, ODBC Native (odbc native): standard connection method. Requires installation of ODBC drivers available from PostgreSQL web site.

ODBC providers have been deprecated in Data Management, and the ODBC drivers may not be supported in future versions.

Internal deadlocks occur if multiple PostgreSQL bulk loaders try to run within a single project. Therefore, a project can only include one PostgreSQL bulk loader tool. If you need to run multiple bulk loads as a unit, you must use automation to connect several individual loads.

RDBMS providers: SAP HANA

Connects to a SAP HANA database.

  • SAP HANA JDBC (jdbc): connects to SAP HANA using JDBC driver.

RDBMS providers: Snowflake

Connects to a Snowflake database.

  • You must download the latest Snowflake JDBC driver (version 3.14.1 or above) from the Maven Central Repository and install it.

  • You also you need a Snowflake account and credentials.

Database name

Database API

Reconnect option available

Snowflake

JDBC

No.

Snowflake Bulk

JDBC

No. If you specify this provider, you may optionally configure additional properties.

Snowflake authentication

Data Management supports the following Snowflake authentication methods:

Authentication method

Description

Snowflake internal authenticator

The default authentication method.

Single Sign On (SSO) and Federated Authentication via Okta

To use this functionality:

  1. Establish an Okta account.

  2. Obtain an Okta authentication URL.

  3. Enter the Okta authentication URL (of the form https://your-okta-account-name.okta.com/) as Authenticator in your Snowflake data connection.

Service user

Data Management v9.5.3 introduces support for Snowflake SERVICE user TYPE to improve the security of non-interactive (i.e., service) use cases.

When Service user is selected, the Private key attribute must be set to the private portion of the service user’s RSA key-pair.

Private key is only applicable for the Service user authentication method.

This authentication method does not use the Password attribute.

Refer to Snowflake Documentation: Create User for more information.

RDBMS providers: SQL Server

This section summarizes the database providers that can connect to Microsoft SQL Server database servers and describes their variants.

Data Management v9.5.3 introduces the following connection attributes (refer to the corresponding properties in Setting the connection properties for more information).

Connection attribute

Description

Default value

Encrypt

Set to "true" to specify that the SQL Server uses TLS encryption for all the data sent between the client and the server if the server has a certificate installed.

"true"

Trust server certificate

If false, the driver validates the server TLS/SSL certificate. If the server certificate validation fails, the driver raises an error and closes the connection.

Set to true to specify that the driver doesn't validate the server TLS/SSL certificate. If true, the server TLS/SSL certificate is automatically trusted when the communication layer is encrypted using TLS.

false

Host name in certificate

The host name to be used to validate the SQL Server TLS/SSL certificate.

null

Many providers offer authentication variants:

  • Trusted connections use implicit operating system security based on the login used by the Data Management services.

  • SQL Auth or Standard Security both refer to SQL Server Authentication.

Many providers offer both "allow reconnect" and "no reconnect" variants. Select the "allow reconnect" option to enable database reconnect for long-running projects that may be susceptible to database disconnects due to cloud-based environmental factors or other issues.

"Allow reconnect" strategies will cause the RDBMS Input tool to spool records until the data set is complete and then move the entire data set at once, rather than streaming records as soon as they are available.

Database name

Database API

Authentication options available

Reconnect option available

Notes

SQL Server

JDBC

SQL Auth, Trusted

Yes

Requires mssql-jdbc-12.8.1.jre8.jar.

ODBC and OLE DB providers have been deprecated in Data Management, and the ODBC/OLE DB drivers may not be supported in future versions.

RDBMS providers: Sybase

Connects to Sybase database servers.

  • Sybase, ASE provider, TC/IP connection, ODBC native (odbc native): connects to Sybase using ODBC driver.

ODBC providers have been deprecated in Data Management, and this ODBC driver may not be supported in future versions.

  • Sybase, JDBC (jdbc): connects to Sybase using JDBC driver.

RDBMS providers: Teradata

Connects to Teradata 16.20 and 17.10 database servers.

  • Teradata, ODBC Native (odbc native): connects to Teradata using ODBC driver.

ODBC providers have been deprecated in Data Management, and this ODBC driver may not be supported in future versions.

  • Teradata, JDBC (jdbc): connects to Teradata using JDBC driver. The JDBC strategy uses the Java DatabaseMetaData class, which requires SELECT access to the DBC.UDTInfo table.

RDBMS providers: Vertica

Connects to Vertica databases.

  • Vertica Native ODBC (odbc_native): connects to Vertica using ODBC driver.

ODBC providers have been deprecated in Data Management, and this ODBC driver may not be supported in future versions.

  • Vertica JDBC (jdbc)

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.