Technical Overview

Responsible Data Discovery is built to give you a better, more accurate view into your organization's privacy risk. It enables privacy managers to have automated and up-to-date inventory reports and informed impact assessments. It powers the fulfillment of comprehensive access and deletion requests. And it shifts your risk reduction to be more proactive with better informed policies and controls around your data processing.

DataGrail’s Approach

Detect data in structured or semi-structured data systems

DataGrail helps you protect sensitive data from compromise by identifying and categorizing it for structured data sources (relational databases) and schema-less systems (NoSQL stores).

Always up-to-date so you don’t have to be

To keep up with evolving compliance laws and data governance regulations, DataGrail classifies data from your company’s data sources and maps it to smart categories. That way, you can help auto-populate common privacy deliverables like DPIAs, RoPAs, subject requests, and more.

Grow the understanding of your tech stack

Apps you use everyday hold personal data, from Salesforce and Zendesk to internal systems and databases. DataGrail employs a smart taxonomy to systemize consumer-sensitive data held in those systems so you can deliver on data protection and privacy requirements quickly.

Data Discovery API Architecture and Workflow

Secure by Design

DataGrail does not want to add to your privacy and security risk. As such, we’ve designed data discovery to prevent direct connection to your data systems from outside your network. Stated in concrete terms, if DataGrail were to be compromised, our systems are designed so a bad actor would not be able to access your internal systems or network.

Minimize Risk

To protect your data, we’ve implemented an agent that can be deployed securely within your private networks that is unreachable via the public internet. This agent can be configured to securely connect to your data systems with read-only privileges without sharing secrets or credentials with DataGrail. It’s able to retrieve schemas and other metadata, and scan and process data. When done processing, the agent shares metadata, classification features and anonymized data with a DataGrail API service that is responsible for classifying the data and updating reports accordingly.

DataGrail Agent - Scanner

Built atop the DataGrail Internal Systems Agent, the scanner is a containerized application that can run on your private or public cloud. If you’re hosted on Amazon Web Services (AWS), for example, we recommend running the agent on ECS Fargate (serverless).

Instructions detail how to enable connections to out-of-the-box database systems like PostgreSQL, MySQL, and Snowflake. New connectors can be added as needed, typically, within two weeks.

Secrets and passwords are stored securely with secret managers in your cloud which provide logging and auditability.

The agent is statically configured (no dynamic configuration is supported) before being executed as a task. These tasks can be scheduled using your preferred containerization platform. Execution can take anywhere from minutes to hours depending on the number of databases and tables that need to be scanned.

Responsible Data Discovery API

When metadata and anonymized sample data are retrieved, the agent securely posts data over HTTPS to the RDD API, authenticated by a token provided by a DataGrail. Here, the data is classified and associated with your internal systems reports, which in turn can inform records of processing activities (RoPAs), privacy impact assessments and more. All in all, DataGrail is able to aggregate all this information across all systems to give you a holistic view of your inherent privacy risk.

The API is designed with flexibility in mind and the detailed specs can be provided upon request to allow proprietary clients to be built.

Data Classification

Key to a better understanding of your privacy footprint are DataGrail’s proprietary classification models, which are able to map thousands of data elements to a few dozen categories (see personal data taxonomy).

Classification accuracy will vary, but DataGrail optimizes for recall (i.e., as few false negatives as possible), given the cost of not identifying personal data, especially if sensitive, far outweighs that of misclassifying some data elements as containing personal data (false positives).

To achieve this, we take a two-phased approach. First, we will classify data in a small set of canonical data systems and can then review results together. Data from sandboxes is recommended where possible. Once fine-tuned, as part of phase two, DataGrail will classify new data on an ongoing basis based on your configuration. We recommend configuring a monthly or quarterly cadence, depending on your capacity to review reports.

Supported Database Criteria

DataGrail is able to support the following types of systems:

Operational SQL systems (e.g., PostgreSQL, MySQL, Microsoft SQL Server)
NoSQL systems (e.g., DynamoDB, S3, MongoDB)
SQL-based analytical systems (e.g., Redshift, BigQuery, Snowflake)

We recommend starting with canonical systems where data is collected. These tend to fall under the first two categories above and are typically considered critical to running customer-facing applications. By starting with these systems, you’ll gain immediate visibility into your inherent privacy risk without tackling the more time-consuming and resource-intensive task of scanning analytical systems, which are likely to have multiple variations of the same data, especially if you have well-established ETL pipelines.

Following up with analytical systems will ensure that all inferred or predicted personal data (e.g., if you are predicting gender from data captured in upstream canonical systems) is also captured and reported accurately.

Up-to-date Reporting

Automated Data Category Updates

As new categories are detected, privacy managers will be alerted automatically with no need to do this manually by system owners via surveys or questionnaires.

Automated Data Category Updates List

Similarly, system reports will be updated automatically with new categories and privacy managers will be able to quickly review.

Sensible Reviews

Reviews can be done at the category level, without having to paginate through thousands of data elements. Once approved, these reports will automatically inform RoPAs, privacy impact assessments and more.

Detailed Reporting

If you want a more detailed view, reports with all data elements are available for review in-app and via export.

Personal Data Taxonomy

info

Categories bolded below have better support.

Contact Information

Email Address

Name (full, first or last)

Phone Number (landline, mobile or fax)

Address (postal, billing)

Username, Social Handle or Alias (addressable)

Employment & Business Information

Application Number

Beneficiaries

Benefits Information

Company Name

Criminal Convictions

Dietary Preferences

Emergency Contacts

Employee Identification Number

Employment Decision Record

Employment History or Status

Individual Quotas

Job Role, Title or Position

Payroll Information

Performance Information

Bio or Profile

Salary Information

Sponsorship Information

Travel Data

Education Information

Assessment or Score

Degree or Certification

Education Status or History

Graduation or Attainment Date

School or Accreditation Body

Government Identification

Driver’s License or Other State ID

Taxpayer Identification Number (TIN)

Immigration or Naturalization Number

Other Government Identifier (Military Identification, Known Traveler, Registro Geral (BZ), My Number (JP) etc)

Passport Number

Professional License Number

National Insurance Number (SIN, SSC, SNAP, GHIC etc)

Social Security Number (SSN)

Vehicle or License Plate Number (VIN)

Demographics & Psychographics

Age, Birthday or Range

Audience Segment

Birthplace or Hometown

Citizenship or Naturalization

Education Level

Family and Lifestyle

Gender

Geography

Immigration Status

Income or Range

Interests, Favorites, Possessions

Marital Status

Military Status

Nationality

Political Opinions or Affiliation

Preferred Language

Presence of Children

Racial or Ethnic Origin

Religious or Philosophical Beliefs

Sex Life or Sexual Orientation

Trade Union Membership

Veteran Status

Online & Mobile Data

Ad Engagement (views, clicks etc)

App or Site Usage (visits, sessions, downloads etc )

Inferred or Derived Data

Browsing History

Consents, Opt-Outs and Preferences

Electronic Signature

Email Engagement (views, opens, clicks, clickthrough etc)

Personal Directory Information (calendar, address book, call/text log, files etc)

Communication Contents (mail, email, messages etc)

Social Profile

Requests, Posts, Comments, Reviews and Ratings

Search History

Online & Mobile Identifiers

Beacon ID

Browser or Device Profile (type, OS, language, resolution, apps etc)

Device ID (MAC, Apple ID, Android ID, Ad ID, serial etc)

Hashed Email or Phone

Household ID

IoT Device ID

IP Address

User ID

Website Visitor ID (cookies, pixels, strings, ad browser fingerprint)

Security & Diagnostics Data

Access and Change Logs

Crash and Event Logs

Credentials (usernames with passwords)

Network Logs

Security Logs

Activation, Recovery or Verification Information

Audiovisual & Sensor Data

Audio

Photos

Sensors

Video

Location Information

Coarse Location (geo, ZIP, radio tower, public beacon etc)

Precise Location (GPS, lat/long, personal beacon, location over time)

Biometric Data

Fingerprint

Facial Patterns

Iris Patterns

Voice Patterns

Handwriting

Genetic Data

DNA

Family Genomics

Ethnographics

Health & Medical Data

Fitness, Diet and Wellness

Heart Rate

Condition

Treatment

Medical History

Medical Record Number

Insurance, Claims and Billing Information

Prescription Information

Height and Weight

Payment & Financial Information

Account Balance

Bank or Financial Account

Bank or Financial Institution

Commercial Decision Record

Credit Score or History

Customer ID

CV2/CVV2/Visual Cryptogram

Know Your Customer (KYC) Information

Payment Information (credit, debit, pay service)

Payment Service Code

Personal PIN or Access Code

Purchase, Order or Transaction Details

Tax and Filing Information

Other

Custom personal data categories and elements directed by DataGrail Customers

Need help?

If you have any questions, please reach out to your dedicated CSM or contact us at support@datagrail.io.

Disclaimer: The information contained in this message does not constitute as legal advice. We would advise seeking professional counsel before acting on or interpreting any material.

DataGrail’s Approach​

Detect data in structured or semi-structured data systems​

Always up-to-date so you don’t have to be​

Grow the understanding of your tech stack​

Data Discovery API Architecture and Workflow​

Secure by Design​

Minimize Risk​

DataGrail Agent - Scanner​

Responsible Data Discovery API​

Data Classification​

Supported Database Criteria​

Up-to-date Reporting​

Automated Data Category Updates​

Sensible Reviews​

Detailed Reporting​

Personal Data Taxonomy​

Contact Information​

Employment & Business Information​

Education Information​

Government Identification​

Demographics & Psychographics​

Online & Mobile Data​

Online & Mobile Identifiers​

Security & Diagnostics Data​

Audiovisual & Sensor Data​

Location Information​

Biometric Data​

Genetic Data​

Health & Medical Data​

Payment & Financial Information​

Other​