Skip to main content
Unlisted page
This page is unlisted. Search engines will not index it, and only users having a direct link can access it.

Responsible Data Discovery - Agent Setup

Overview

Responsible Data Discovery is built to give you a more accurate view of your organization's privacy risk. It enables privacy managers to have automated and up-to-date inventory reports and informed impact assessments. It powers the fulfillment of comprehensive access and deletion requests. And it shifts your risk reduction initiatives to be more proactive with better informed policies and controls around your data processing.

To achieve this, DataGrail uses one or more containerized agents that you can run in your network to connect with your data sources, scan them, and then pre-process data for classification.

To get started, your team will need to accomplish the following tasks (more details below):

  • Source the Docker image provided by DataGrail.
  • Configure the agent.
  • Create and run the containerized agent service.

1. Sourcing the Docker Image

The Docker image for the data discovery agent is hosted in the DataGrail ECR repository (ask about other registries), which you will be granted access to. Use an ARN with the version specified for retrieving your image. Ask your engagement manager or CSM for this information.


2. Agent Configuration

In this section, you will configure the agent in the following steps:

  • Create a new agent in DataGrail.
  • Create a DataGrail API key and store it in your secrets vault.
  • Configure the container template.

Add a New DataGrail Agent

The first step in enabling RDD is to create an agent within DataGrail. Each instance of an agent will enable you to add and configure systems you’d like to connect and scan.

  1. In the left-hand navigation bar under Integration Network, select Agents.
  2. Select Add New Agent.
  3. Enter a name for the agent that is easy to remember and that can be associated with a private network, region, types of systems, etc.
  4. For Agent Type, select Data Discovery.
  5. Select Add New Agent.

Create a DataGrail API Key

For an agent to securely communicate with DataGrail, an API key is required. This key should be stored in your secrets vault. For more information on how to configure various credentials managers, see the Agent Platform CredentialsManager.

  1. Within the newly created agent’s page, select Generate API Key.
  2. Give the key a convenient name.
  3. Copy and securely store the API key in your secrets vault using the following format:
Secret Configuration Format

The key-value pairs must be formatted as token: API key, and all other values are only applicable for some vaults and aid in identification of secrets.

Secret Type: Other type of secret
Key/value pairs:
token: <API Key copied above>
Secret name: datagrail-api-key
Description: <description for the secret>
API Keys

Once saved, API keys cannot be viewed or copied. If a key is lost, you can generate a new one.

DataGrail recommends regularly rotating API keys in accordance with your policies.

It is strongly recommended that one API key be used per containerized service for security and debuggability purposes.

Configure the Container Template

With the agent created in DataGrail, it’s now time to configure the container that will run within your network.

DataGrail recommends running the agent with your preferred container orchestration platform, such as Kubernetes or serverless via AWS ECS, Google Cloud Run, etc. Terraform configuration scripts are available for some cloud platforms. Ask your DataGrail engagement manager for information.

Each container requires an DATAGRAIL_AGENT_CONFIG environment variable, which is set to a JSON object:

{
"customer_domain": "yoursubdomain.datagrail.io",
"datagrail_credentials_location": "<secret location>", # DG API Key vault location
"platform": {
"credentials_manager": {
"provider": "<AWSSSMParameterStore|AWSSecretsManager|JSONFile|GCP|AzureKeyVault>",
"options": {
"optional": "some modules may have additional required fields",
}
}
}
}

Below is a detailed explanation of the JSON fields above.

Field
customer_domainstring (required)
Your DataGrail domain. This is the hostname that the agent will be posting back results to via the RDD API.
datagrail_credentials_locationstring (required)
The vault location of the DataGrail API Key generated above. Used to authorize requests and obtain system configurations, post back scan results, and more.
platformobject (required)
The following platforms are supported for credentials: AWS Parameter Store, AWS Secrets Manager, GCP Secret Manger, JSON Files via attached volumes, and Azure Key Vault. For more information, see the Agent Platform CredentialsManager.

3. Running the Agent

After you’ve configured the agent, you’re now ready to run the service. Below are general guidelines to ensure that the service is able to run with proper roles and permissions. Your onboarding specialist will be supporting you with recommendations tailored to your needs.

Network Settings

DataGrail designed data discovery agents with flexibility and security in mind, allowing you to deploy agents where your data sources are hosted to minimize security and privacy risks.

Recommended Configuration:

  1. One or more agents per virtual private network or data center where data sources are readily reachable and no bridged or proxied connections are necessary.
  2. Run in a private subnet with no ingress.
  3. Egress rules set to only allow connections to your DataGrail domain and the source of the Docker image.

DataGrail has Terraform configuration files to help you get started. Please reach out to your engagement manager for more information.

Minimum System Requirements

Each agent should be configured with at least the following requirements:

  • Cores: 4+
  • Memory: 8 GB+
  • Disk storage: 20 GB
Variable Scan Duration

The duration of each scan can range from minutes to a few hours depending on the volume of your data sources, the number of data elements (columns), and more. Please consult with DataGrail support for more tailored recommendations as needed.

Logging

Persisting logs for up to 30 days is strongly recommended for debugging purposes. You can use logs to identify configuration issues during setup. For other errors or exceptions, reach out to DataGrail support with detailed error messages and stack traces.

Running the Service

Once the containerized service is running, you should see a green check mark in DataGrail confirming a successful agent connection.

Successful Connection

 

Need help?
If you have any questions, please reach out to your dedicated CSM or contact us at support@datagrail.io.

Disclaimer: The information contained in this message does not constitute as legal advice. We would advise seeking professional counsel before acting on or interpreting any material.