Skip to main content

Integrating Databricks

Capabilities

DataGrail's Databricks integration provides the following capabilities:

ProductCapability
Live Data MapData Discovery

Before You Start

To successfully configure this integration, please ensure you have sufficient privileges:

  • DataGrail User Role: Super Admin, Connections Manager
  • Databricks User Role: Admin
  • Secrets Vault: Write Access

Connection Instructions (RDD Agent)

The Responsible Data Discovery Agent allows you to securely perform data classification by connecting to internal systems within your network, and without requiring ingress from the public network.

For the Agent to scan your Databricks instance, read-only credentials are created and stored in a vault on your network. When configuring the Databricks integration in DataGrail, only the location of that vault entry will be referenced (e.g. AWS Secrets Manager ARN), which ensures that no secrets are shared directly with DataGrail.

Before Connecting

In order to start scanning Databricks, ensure the following:

  • RDD Agent is deployed and connected in DataGrail.
  • Network is configured to allow the Agent to connect with the Databricks instance.

Create and Store Credentials

  1. In Databricks, create a new user datagrail-scanner with read-only privileges. Consult your preferred Databricks documentation as needed.

  2. Configure the following JSON key-value pairs:

    {
    "host": "<Databricks instance host name>",
    "http_path": "<HTTP path to DBSQL endpoint (e.g. /sql/1.0/endpoints/...) or to a DBR interactive cluster (e.g. /sql/protocolv1/o/...)>",
    "access_token": "<HTTP Bearer access token, e.g. Databricks Personal Access Token>"
    }
  3. Store the JSON value in your vault with an entry name like datagrail-databricks-rdd.

Add the Agent Integration

  1. In DataGrail, navigate to Agents under Integration network.
  2. Select your Agent.
  3. In the top right, select Add New Integration.
  4. Search for Databricks, then select Configure.
  5. Enter an Integration Name, and only enable the Data Discovery capability.
  6. Enter the Credentials Location (e.g. AWS Secrets Manager ARN).
  7. (optional) Choose the Business Processes, Region, and System Location.
  8. Finally, select Configure Integration. Wait a few moments to ensure that the connection is successful. For failed connections, review the Agent container logs for additional details.

Troubleshooting

If you are unable to successfully connect the integration, review these common troubleshooting steps:

Agent Unable to Connect to Databricks
  1. Verify that the network is configured to allow the Agent to connect with the Databricks instance.
  2. Verify the Agent has permissions to access the Databricks credentials stored in your vault.
Agent is Not Connected in DataGrail

Review the setup guide, and ensure that:

  1. The DataGrail API Key is valid and has not expired.
  2. The Agent has permissions to access the DataGrail API Key stored in your vault.
  3. Network egress is permitted from the Agent to your DataGrail domain.

 

Need help?
If you have any questions, please reach out to your dedicated CSM or contact us at support@datagrail.io.

Disclaimer: The information contained in this message does not constitute as legal advice. We would advise seeking professional counsel before acting on or interpreting any material.