Integrating BigQuery
Capabilities
DataGrail's BigQuery integration provides the following capabilities:
Product | Capability |
---|---|
Live Data Map | Data Discovery |
Before You Start
To successfully configure this integration, please ensure you have sufficient privileges:
- DataGrail User Role: Super Admin, Connections Manager
- BigQuery User Role: Admin
- Secrets Vault: Write Access
- IAM Roles: Write Access
Connection Instructions (RDD Agent)
The Responsible Data Discovery Agent allows you to securely perform data classification by connecting to internal systems within your network, and without requiring ingress from the public network.
For the Agent to scan BigQuery, IAM roles are used to allow the Agent to connect and perform the necessary operations. Additional connection details will be stored securely in a vault on your network. When configuring the BigQuery integration in DataGrail, only the location of that vault entry will be referenced (e.g. Secret Manager resource name), which ensures that no secrets are shared directly with DataGrail.
The Agent uses BigQuery TABLESAMPLE to randomly sample blocks that are each roughly 1GB in size.
In an effort to manage costs, the Agent by default only uses up to 5
blocks and then randomly samples within those blocks.
Given BigQuery pricing, scans will cost roughly up to $100 per about 2,000 tables (assuming all tables are at least 5GB in size).
To further manage the block sampling, set the optional environment variable MAX_BIGQUERY_BLOCKS
. We recommend no more than 10
.
In order to start scanning BigQuery, ensure the following:
- RDD Agent is deployed and connected in DataGrail.
- Network is configured to allow the Agent to connect with the BigQuery instance.
Store Connection Details in Vault
- To specify the target project, configure the following JSON key-value pairs:
{
"project_id": "<project ID>"
}
- Store the JSON value in your vault with a name like
datagrail-rdd-bigquery
. - Ensure that the Agent has the necessary permissions to access this vault entry.
Add Service Account Roles
To grant the Agent access to BigQuery, add the following roles to the Agent service account:
BigQuery Data Viewer
BigQuery Job User
Add the Agent Integration
- In DataGrail, navigate to Agents under Integration network.
- Select your Agent.
- In the top right, select Add New Integration.
- Search for BigQuery, then select Configure.
- Enter an Integration Name, and only enable the Data Discovery capability.
- Enter the Connection Details Location (e.g. Secret Manager resource name).
- (optional) Choose the Business Processes, Region, and System Location.
- Finally, select Configure Integration. Wait a few moments to ensure that the connection is successful. For failed connections, review the Agent container logs for additional details.
Troubleshooting
If you are unable to successfully connect the integration, review these common troubleshooting steps:
Agent Unable to Connect to BigQuery
- Verify that the network is configured to allow the Agent to connect with the BigQuery instance.
- Verify the Agent has permissions to access the BigQuery credentials stored in your vault.
Agent is Not Connected in DataGrail
Review the setup guide, and ensure that:
- The DataGrail API Key is valid and has not expired.
- The Agent has permissions to access the DataGrail API Key stored in your vault.
- Network egress is permitted from the Agent to your DataGrail domain.
Disclaimer: The information contained in this message does not constitute as legal advice. We would advise seeking professional counsel before acting on or interpreting any material.