Skip to main content

Integrating Amazon Athena

Capabilities

DataGrail's Athena integration provides the following capabilities:

ProductCapability
Live Data MapData Discovery

Before You Start

To successfully configure this integration, please ensure you have sufficient privileges:

  • DataGrail User Role: Super Admin, Connections Manager
  • Athena User Role: Admin
  • Secrets Vault: Write Access
  • IAM Policies: Write Access

Connection Instructions (RDD Agent)

The Responsible Data Discovery Agent allows you to securely perform data classification by connecting to internal systems within your network, and without requiring ingress from the public network.

For the Agent to scan Athena, an IAM Policy is used to allow the Agent to connect and perform the necessary operations. Additional connection details will be stored securely in a vault on your network. When configuring the Athena integration in DataGrail, only the location of that vault entry will be referenced (e.g. AWS Secrets Manager ARN), which ensures that no secrets are shared directly with DataGrail.

Before Connecting

In order to start scanning Athena, ensure the following:

  • RDD Agent is deployed and connected in DataGrail.
  • Network is configured to allow the Agent to connect with the Athena instance.

Create IAM Policy

To connect to Athena and scan a database cataloged by AWS Glue, you will first need to create an IAM policy that will be attached to the Fargate ECS Task Role. The following instructions provide an example policy to configure, but it's recommended to consult your Athena admin.

  1. Using the following example, configure the IAM Policy:
Example IAM Policy JSON

To use this policy, make the following updates:

  1. Replace the placeholder values in <angle brackets> with your values (e.g. region, account ID, database).
  2. To support multiple databases, buckets, regions, etc, you may wish to list those separately or use wild cards.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Athena",
"Effect": "Allow",
"Action": [
"athena:ListDataCatalogs",
"athena:GetTableMetadata",
"athena:StartQueryExecution",
"athena:GetQueryResultsStream",
"athena:GetQueryResults",
"athena:GetDatabase",
"athena:GetDataCatalog",
"athena:GetQueryRuntimeStatistics",
"athena:ListDatabases",
"athena:StopQueryExecution",
"athena:GetQueryExecution",
"athena:BatchGetNamedQuery",
"athena:ListTableMetadata",
"athena:BatchGetQueryExecution",
"athena:GetWorkGroup"
],
"Resource": [
"arn:aws:athena:*:<ACCOUNT_ID>:workgroup/*",
"arn:aws:athena:*:<ACCOUNT_ID>:datacatalog/*"
]
},
{
"Sid": "QueryResults",
"Effect": "Allow",
"Action": [
"s3:ListBucketMultipartUploads",
"s3:CreateBucket",
"s3:ListBucket",
"s3:ListMultipartUploadParts",
"s3:PutObject",
"s3:GetObject",
"s3:AbortMultipartUpload",
"s3:GetBucketLocation"
],
"Resource": ["arn:aws:s3:::aws-athena-query-results-*"]
},
{
"Sid": "Metadata",
"Effect": "Allow",
"Action": [
"glue:GetTable",
"glue:GetDatabases",
"glue:GetDatabase",
"glue:GetTables",
"glue:GetPartition",
"glue:GetPartitions",
"glue:GetSchema",
"glue:SearchTables"
],
"Resource": [
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:table/<DATABASE_NAME>/*",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/<DATABASE_NAME>",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog"
]
},
{
"Sid": "TempTables",
"Effect": "Allow",
"Action": ["glue:DeleteTable", "glue:CreateTable"],
"Resource": [
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:table/<DATABASE_NAME>/temp_table_*",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/<DATABASE_NAME>",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog"
]
},
{
"Sid": "Bucket",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucketMultipartUploads",
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListMultipartUploadParts"
],
"Resource": "arn:aws:s3:::<BUCKET_NAME>*"
}
]
}
  1. Create the IAM policy.
  2. Attach the new policy to the Fargate ECS Task Role.
Temp Table Permissions

The Athena client uses CREATE TABLE AS SELECT (CTAS) to improve performance, which requires permissions to create and delete temp_tables_. These tables are suffixed by a random ID as defined in the TempTables policy.


Store Connection Details in Vault

  1. To specify the target database, configure the following JSON key-value pairs:
{
"database": "<database name>",
"region": "<region>"
}
  1. Store the JSON value in your vault with a name like datagrail-rdd-athena.
  2. Ensure that the Agent has the necessary permissions to access this vault entry.

Add the Agent Integration

  1. In DataGrail, navigate to Agents under Integration network.
  2. Select your Agent.
  3. In the top right, select Add New Integration.
  4. Search for Athena, then select Configure.
  5. Enter an Integration Name, and only enable the Data Discovery capability.
  6. Enter the Connection Details Location (e.g. AWS Secrets Manager ARN).
  7. (optional) Choose the Business Processes, Region, and System Location.
  8. Finally, select Configure Integration. Wait a few moments to ensure that the connection is successful. For failed connections, review the Agent container logs for additional details.

Troubleshooting

If you are unable to successfully connect the integration, review these common troubleshooting steps:

Agent Unable to Connect to Athena
  1. Verify that the network is configured to allow the Agent to connect with the Athena instance.
  2. Verify the Agent has permissions to access the Athena credentials stored in your vault.
Agent is Not Connected in DataGrail

Review the setup guide, and ensure that:

  1. The DataGrail API Key is valid and has not expired.
  2. The Agent has permissions to access the DataGrail API Key stored in your vault.
  3. Network egress is permitted from the Agent to your DataGrail domain.
Agent Fails to Retrieve Metadata

The Athena client may fail to retrieve metadata about the tables and schemes if thousands of them exist.

To resolve this issue, try limiting policies to:

  • one catalog
  • one database
  • a set of tables with a predetermined pattern

 

Need help?
If you have any questions, please reach out to your dedicated Account Manager or contact us at support@datagrail.io.

Disclaimer: The information contained in this message does not constitute as legal advice. We would advise seeking professional counsel before acting on or interpreting any material.