Skip to main content
Skip table of contents

Databricks

Overview

Databricks is a cloud-based data engineering and data science platform that provides a unified analytics solution for processing, storing, and collaborating on large datasets. By integrating with Databricks, you can unlock the full potential of their customer data, enabling them to create highly personalized and effective customer interactions.

Prerequisites

  • UID and password of the Databricks instance that will be configured in RPI

  • Download the latest version of the Simba Spark ODBC driver

  • The name of the Unity Catalog that will be used for configuring the Auxiliary database connection in RPI

Step 1: Install Databricks ODBC driver for Linux

  1. Go to Databricks Download Page (https://www.databricks.com/spark/odbc-drivers-download).

  2. Download ODBC for Linux (RPM) installer.

  3. Copy the downloaded file (SimbaSparkODBC-<version>-LinuxRPM-64bit.zip) to a working directory (e.g., copy on windows machine D:\Redpoint\Docker-v7-Dev\RPI-Services\config\temp).

  4. Extract the zip file SimbaSparkODBC-<version>-LinuxRPM-64bit.zip.

  5. Inside the extracted folder, extract the file simbaspark-2.7.7.1016-1.x86_64.rpm (we recommend using 7zip).

  6. Inside the extracted folder, extract the file simbaspark-2.7.7.1016-1.x86_64.cpio (we recommend using 7zip).

  7. Copy the folder spark (inside extracted folder /opt/spark) to Docker container folder execution service folder /app/odbc-lib.

Step 2: Configure ODBC driver

Update the following 3 ini files:

simba.sparkodbc.ini

  1. On /app/odbc-lib/spark/lib/64/, open simba.sparkodbc.ini

  2. Update ErrorMessagesPath=/app/odbc-lib/spark/ErrorMessages

    CODE
    [Driver]
    ErrorMessagesPath=/app/odbc-lib/spark/ErrorMessages
    LogLevel=0
    LogPath=
    SwapFilePath=/tmp

odbcinst.ini file (Driver Registration)

On /app/odbc-config/odbcinst.ini, add the following entries to the odbcinst.ini file:

CODE
[ODBC Drivers]
Simba Apache Spark ODBC Connector=Installed

[Simba Apache Spark ODBC Connector]
Description=Simba Apache Spark ODBC Connector
Driver=/app/odbc-lib/spark/lib/64/libsparkodbc_sb64.so

odbc.ini file (DSN Entries)

On /app/odbc-config/odbc.ini, add connection details and configure parameter values:

CODE
[ODBC]
Trace=no
[ODBC Data Sources]
databricks=Simba Spark ODBC Driver 64-bit
[databricks]
# Description: DSN Description.

# This key is not necessary and is only to give a description of the data source.
Description=Simba Spark ODBC Driver (64-bit) DSN

# Driver: The location where the ODBC driver is installed to.
Driver=/app/odbc-lib/simba/spark/lib/libsparkodbc_sb64.so
# The host name or IP of the Thrift server.
HOST=<Databricks HOST URL>
#HTTP Path
HTTPPath=/sql/1.0/warehouses/c546e1e69e8d2ac9
# The TCP port Thrift server is listening.
PORT=443
# The name of the database schema to use when a schema is not explicitly specified in a query.
Schema=<schema>
# The Spark Server Type
# 1 - Shark Server 1 for Shark 0.8.1 and earlier
# 2 - Shark Server 2 for Shark 0.9.*
# 3 - Spark Thrift Server for Shark 1.1 and later 
SparkServerType=3
# The authentication mechanism to use for the connection.
#   Set to 0 for No Authentication
#   Set to 1 for Kerberos
#   Set to 2 for User Name
#   Set to 3 for User Name and Password
# Note only No Authentication is supported when connecting to Shark Server 1.
AuthMech=3
# The Thrift transport to use for the connection.
#	Set to 0 for Binary
#	Set to 1 for SASL
#	Set to 2 for HTTP
# Note for Shark Server 1 only Binary can be used.
ThriftTransport=2
# When this option is enabled (1), the driver does not transform the queries emitted by an 
# application, so the native query is used.
# When this option is disabled (0), the driver transforms the queries emitted by an application and 
# converts them into an equivalent from in Spark SQL.
# UseNativeQuery=0
# Set the UID with the user name to use to access Spark when using AuthMech 2 or 3.
UID=<username>
PWD=<password>
# Set to 1 to enable SSL. Set to 0 to disable.
SSL=1

Step 3: Configure AuxDB in RPI Tenant

  1. Log into RPI and go to Configuration > Tenant.

  2. Scroll to the bottom and click Manage Auxiliary Databases.

  3. Create a new Auxiliary Database:

    1. Click the Add button.

    2. Enter a name for the Auxiliary database, e.g., “Databricks”.

    3. Choose Databricks as the database type.

    4. Enter the connection string as DSN=<name of ODBC data source configured in the odbc.ini file>.

    5. Enter schema name in the following format: <unity catalog name>.<schema>.

    6. Click the Save button.

    7. Click the Test database connection button to validate that RPI can connect to the Databricks instance.

databricks.png

Step 4: Sync the Catalog

  1. Log into RPI and go to Configuration > Catalog.

  2. In the Database drop-down, choose the Databricks Auxiliary Database you configured in the previous step.

  3. Click the Synchronize Catalog button.

  4. Once the catalog sync is complete, click the refresh button on the far left. The list of tables in the unity catalog should display:

databricks2.png

Additional resources

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.