Skip to main content

Internal Systems API Specification

Abstract

This document defines a versioned standard specification to build consistent, scalable, and secure REST APIs for any organization's internal system, like databases, data warehouses, unstructured data stores, or homegrown applications, in order to provide an interface with the DataGrail Platform to process privacy requests.

Introduction

This document establishes the guidelines the REST API should follow so interfaces are implemented consistently adhering to common endpoints and methods designed by DataGrail while leveraging state-of-the-art security practices and communication patterns to minimize the risk of exposure of any internal data.

Organizations may use any programming language and/or frameworks to build their APIs. All of their operations eventually will boil down to HTTP requests that DataGrail will use to access authorized resources via the API standard defined in this specification.

Organizations should note that this specification is designed to be asynchronous for all data access and deletion calls when processing privacy requests in order to ensure the efficient management of multiple simultaneous and potentially long-running requests. Identifier lookups are synchronous due to workflow considerations.

Finally, by following this specification, we aim to achieve the following:

  • Define consistent and secure practices and patterns for all API endpoints interacting with DataGrail.
  • Put organizations in control of the interfaces to critical data infrastructure.
  • Make connecting and using DataGrail via REST API for internal systems as easy as any third-party SaaS without compromising data security and integrity.
  • Allow organizations to leverage prior work to implement REST endpoints defined consistently.
  • Optionality and extensibility by default.

Connecting with DataGrail

Security

Given the sensitive nature of the data exchanged to process privacy requests, all interactions with DataGrail must be secured using HTTPS via TLS v1.2 or above.

Authorization

DataGrail allows two types of authentications to allow limited authorized access to resources exposed via APIs, OAuth 2.0 Client Credentials Grant and Token Based.

OAuth 2.0 Client Credentials Grant

DataGrail recommends organizations choose this authentication method because it is the most secure option. Organizations should provide the standard endpoint for token retrieval required when connecting DataGrail to the implemented API. Specifically, you will need to provide the following data when connecting your API with DataGrail:

  • Client ID: A unique ID used to identify requests from DataGrail.
  • Client Secret: The secret to authorizing requests from DataGrail.
  • API Base URL: The base URL where requests will be sent.
  • API Token Endpoint URL: The endpoint that DataGrail will initiate the OAuth flow (e.g. /oauth/token).
Example OAuth Authorization Request

When initially connecting using the OAuth authorization method, DataGrail will initiate a client_credentials grant request to your API Base URL plus your API Token Endpoint URL i.e. https://my_company_name.com/oauth/token.

Example Headers

In the headers of the request from DataGrail, the authorization header will be a base64 encoded value of a string interpolated with the Client ID and Client Secret as client_id:client_secret. You can decode and verify the Client ID and Client Secret you’ve configured during the initial connection.

{
"content_type": "application/x-www-form-urlencoded",
"authorization": "Basic <base64 encoded client_id:client_secret>",
"X-Dg-Authtype": "OAuth",
"user_agent": "dgclient/1.0"
}
Example Body

The body will contain the grant_type with a value of client_credentials.

{
"grant_type": "client_credentials"
}
Expected Response

The access_token returned by you will be used as the authorization bearer token for all other internal systems requests.

{
"access_token": "<your_access_token>",
"token_type": "Bearer",
"expires_in": "<expiry_time>"
}
Important Considerations

Follow these considerations when connecting to API and doing your own OAuth token management:

  1. Other OAuth 2.0 grant types are not supported at the moment.
  2. DataGrail does not require implementation of authorization scopes at the moment. If you implement them, ensure that the appropriate scopes are attached to the token grant.

Token Based

As an alternative to OAuth 2.0 Client Credentials Grant, and because we acknowledge that building such capabilities may be an operational burden for some organizations, DataGrail will also accept the less secure option of a static token-based authentication.

The static token you set here will be used as the authorization bearer token for all other internal systems requests.

Authorized Requests From DataGrail

Depending on the authorization method, we will use the access_token or the static token as the bearer token for all other internal systems requests. You can verify the bearer token to authorize the requests. The header will look as follows.

{
"content_type": "json",
"authorization": "Bearer <some_token>",
"X-Dg-Authtype": "Bearer",
"user_agent": "dgclient/1.0"
}

Endpoints

DataGrail requires organizations to define at least one environment for Production to be accessed via a specific base URL that must be common for all endpoints implemented in accordance with the rest of this guide.

DataGrail also recommends organizations define at least one environment for Development and/or Testing to be able to test their implementation without interacting with real production data.

DataGrail recommends using the following pattern:

  • Production URL: https://datagrail_prod.my_company_name.com/
  • Development URL: https://datagrail_dev.my_company_name.com/

API Overview

Workflow

This is an overview of a standard privacy data request through an Internal Systems API connection: Internal Systems Workflow

Versioning

This specification will be released by major increments using versioning starting with version v1. Since we designed it with optionality and extensibility by default, we should be able to add additional functionality as well as non-breaking changes to the existing version, thus limiting the need to update the version specified.

All endpoints must include the version number embedded in the path of the request URL, at the end of the service root, following this pattern: <base-url>/api/v1

HTTP Response Codes

The various HTTP status codes that are supported include:

HTTP CodeDescription
200 OKThe request was successful.
201 CreatedThe request was successful and resulted in the creation of a resource.
400 Bad RequestThe request was submitted with invalid formatting or parameters, and the server will not process the request.
401 UnauthorizedAn invalid authentication signature was provided or the authentication signature was missing.
403 ForbiddenAn authorization signature was provided without permission to access a resource.
405 Method Not AllowedThe request is not valid for the provided connection.
409 ConflictIndicates a request conflict with the current state of the target resource.
429 Too Many RequestsThe client (Datagrail) should slow down and try again later.
500 Internal Server ErrorAn internal and unexpected error condition occurred.
501 Not ImplementedThe request is not supported by this API implementation.
502 Bad GatewayThe server received an invalid response from the upstream server.
503 Service UnavailableThe server cannot handle the request.
504 Gateway TimeoutServer was acting as a gateway or proxy and did not receive a timely response from the upstream server.

Status

All requests must have an associated status to track and troubleshoot if necessary the progress of a single request. Organizations must implement the following request statuses supported by DataGrail:

NameDescription
requestedIndicates that a well-formed request has been received
processingIndicates that a request is currently being acted on
completedIndicates that a request has been fulfilled
failedIndicates that a request has failed to complete. The response should provide detailed information regarding why this happened

Error Responses

Error responses should include the appropriate HTTP status and detailed information about what went wrong encoded in the response as JSON to facilitate troubleshooting. Any errors should be placed in a json object named "errors" at the root of the response object. An associated message may be included in the "message" field. An example error response could look like:

{
"status": "failed",
"errors": [
{"error": "info"}
],
"message": "Request is invalid."
}

Pagination

Pagination is supported by DataGrail (specific endpoints are notated to indicate support) using a standard format.

All endpoints which support pagination should accept a numeric page parameter. Page indexing should begin at 1, and calling a paginated endpoint without this parameter should be equivalent to calling the endpoint with page=1. For example: GET /sample_endpoint should return the same result as: GET /sample_endpoint?page=1.

Responses should also include fully-qualified URL links to the "next" and "previous" page. For the first and last pages, the "next" and "previous" values should be set to either null or an empty string value correspondingly.

For example, a request for page=3 may result in:

{
"count": 1,
"next": "<base-url>/v1/connections/list?page=4",
"previous": "<base-url>/v1/connections/list?page=2"
}

General Endpoints

Test Connection

GET /api/v1/hc

DataGrail will call this endpoint to test that the credentials provided for your API are valid and the service is healthy.

Parameters

No parameters will be sent for this endpoint.

Expected Responses

Status Code: *200 OK

{
"status": "completed",
"version": "v1"
}

Status Code: 401 Unauthorized / 403 Forbidden

{
"status": "error",
"message": "Message describing the reason for the error"
}

List Available Connections

GET /api/v1/connections/list

This endpoint should return a paginated list of internal connections available to process requests by DataGrail.

Capabilities

Individual connections should define the set of endpoints that they support in the "capabilities" response field. Organizations can configure the following capabilities supported by DataGrail:

NameDescription
privacy/accessSupports processing of access requests.
privacy/deleteSupports processing of deletion requests.
privacy/optoutSupports processing of opt out requests.
privacy/identifiersSupports retrieval of identifiers (e.g. lookup user_id by passing email).
capability/multiple-identifiersConnection will be passed Identifiers in the multiple identifiers format.

Note: At the moment, DataGrail will only support these capabilities. However, capabilities will expand in the future as support is added for further request types.

Mode

Organizations must support connection modes to their data systems to be adequately tested and validated without interacting with real data subject data or triggering privacy requests in the DataGrail platform. Organizations can configure the following modes supported by DataGrail:

NameDescription
liveConnection can be used in production with real privacy requests from data subjects
testConnection can only be used for development. In this mode, no real privacy requests will be triggered within the DataGrail platform.

Note: If no mode is provided, DataGrail will assume that the connection is in test mode by default.

Parameters

Parameter NameDescription
pageResults page identifier

Expected Success Response

Status Code: 200 OK

{
"count": 123,
"next": "<base-url>/v1/connections/list?page=4",
"previous": "<base-url>/v1/connections/list?page=2",
"results": [
{
"uuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"type": "PostgreSQL",
"name": "Accounts DB",
"mode": "live",
"capabilities": [
"privacy/access",
"privacy/delete",
"privacy/optout",
"capability/multiple-identifiers"
]
}
]
}

Privacy Request Endpoints

Submit an Identifier Retrieval Request

POST /api/v1/privacy/identifiers/<connection-uuid>

DataGrail will call this endpoint to initiate an identifier retrieval request for the passed connection. This endpoint is synchronous and MUST return results in the response.

Identifiers are verifiable keys that allow the API to uniquely identify a data subject within a specific context like a data system or an organization. Data subjects can be linked to multiple identifiers, therefore the APIs must be able to support an indefinite number of identifiers of different types as input.

Additionally, Identifiers are linked to categories which allow us to determine what integrations can accept the identifier.

Identifier Categories

This table represents the various identifier categories that DataGrail supports.

Usage: "key" should be inserted into <identifier category key> in the "identifiers" parameter.

NameKeyValidationSubcategoryDescription
Advertising IDadvertising_idNoDevice ID used for Ad-based purposes.
iOSios_advertising_idNoYes
Googleandroid_advertising_idNoYes
Amazonfire_advertising_idNoYes
Microsoftmicrosoft_advertising_idNoYes
Rokuroku_advertising_idNoYes
App IDapplication_idNoSystem-assigned ID to a customer application.
Browser IDbrowser_idNoID provided via a browser cookie.
Email AddressemailYesPersonal, work, or other types of email address.
Phone NumberphoneYesPersonal, work, cell, or other types of phone numbers.
Service IDservice_idNoSystem-assigned ID for a data subject record.
Social Media IDsocial_media_idNoID provided via a social media profile.
Twittertwitter_idNoYes
Facebookfacebook_idNoYes
Intercomintercom_idNoYes
Smoochsmooch_idNoYes
User IDuser_idNoCustom customer assigned ID for a data subject record.

Body Parameters (request)

Identifier object format

Identifiers are a JSON object that contains values used to identify the data subject’s personal data. If the connection has the "capability/multiple-identifiers" capability, the identifiers parameter will be structured as:

{
"<name of identifier>": [
{"<identifier category key>": "<email address>"},
...
]
}

Example

{
"email": [
{"email": "batman@cave.com"},
{"email": "robin@cave.com"}
]
}

If the connection does not have the "capability/multiple-identifiers" capability, the identifiers parameter will be structured as:

{
"email": [
"guinevere@camelotknights.com",
"queen@camelotknights.com"
]
}
Parameter NameDescriptionRequired
identifiersSee the Identifier Object Format aboveYes
request_uuidThe request-specific UUID mapping to the Portal Ticket stored in DataGrail.Yes

Expected Success Response

Status Code: 200 OK

The request was accepted and the results are available in the response body.

{
"<name of identifier>": [
{"<identifier category key>": "<identifier value>"},
...
]
}

Example

{
"phone_number": [
{"phone": "+11234567890"}
],
"email": [
{"email": "batman@cave.com"},
{"email": "robin@cave.com"}
]
}

Submit an Access Request

POST /api/v1/privacy/access/<connection-uuid>

DataGrail will call this endpoint to initiate a privacy access request for the passed connection. This endpoint is asynchronous and should not return results.

The results should be returned through a webhook callback (detailed in the "Retrieve Results" section). DataGrail will trigger a request to the "Retrieve Results" endpoint periodically until we receive results.

Body Parameters

Parameter NameDescriptionRequired
identifiersSee the Identifier Object Format aboveYes
results_tokenHexadecimal string (length of 16) that will be used to retrieve the results of this request.Yes
request_uuidThe request-specific UUID mapping to the Portal Ticket stored in DataGrail.Yes
callback_pathThe URL path to be used when triggering the results callback. To construct the full URL, the callback_path should be appended to your customer domain. The callback_path will lead with a forward slashYes

Expected Success Response

Status Code: 200 OK

The request was accepted and the process to respond to the request will be initiated.

{
"status": "processing"
}

Submit a Deletion Request

POST /api/v1/privacy/delete/<connection-uuid>

DataGrail will call this endpoint to initiate a privacy deletion request for the passed connection. This endpoint is asynchronous and should not return results.

The results should be returned through a webhook callback (detailed in the "Retrieve Results" section). DataGrail will trigger a request to the "Retrieve Results" endpoint periodically until we receive results.

Body Parameters

Parameter NameDescriptionRequired
identifiersSee the Identifier Object Format aboveYes
results_tokenHexadecimal string (length of 16) that will be used to retrieve the results of this request.Yes
request_uuidThe request-specific UUID mapping to the Portal Ticket stored in DataGrail.Yes
callback_pathThe URL path to be used when triggering the results callback. To construct the full URL, the callback_path should be appended to your customer domain. The callback_path will lead with a forward slash.Yes

Expected Success Response

Status Code: 200 OK

The request was accepted and the process to respond to the request will be initiated.

{
"status": "processing"
}

Submit an Opt-Out Request

POST /api/v1/privacy/optout/<connection-uuid>

DataGrail will call this endpoint to initiate an opt-out/Do Not Sell/Share request for the passed connection. This endpoint is synchronous and will return the success/failure of the operation.

Body Parameters

Parameter NameDescriptionRequired
identifiersSee the Identifier Object Format aboveYes
results_tokenIgnored; Hexadecimal string (length of 16).No
request_uuidIgnored; Random UUID.No
callback_pathIgnored; Opt-Outs are synchronous.No

Expected Success Response

Status Code: 200 OK

The request was completed.

{
"status": "completed"
}

Retrieve Results

POST /api/v1/results/retrieve

Trigger a webhook callback back to DataGrail which includes the results of a data request if the request has completed.

This endpoint is intended to ensure DataGrail can retrieve completed request data if there is an issue capturing the results via the webhook callback. DataGrail only guarantees polling this endpoint with a frequency of every 15 minutes for a default period of 3 days.

Webhook Callback

Requests to the DataGrail callback endpoint should include the following headers (See your DataGrail representative for a pre-registered authentication token):

{
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": "Bearer <token supplied by DataGrail>"
}

The data response structures for applicable endpoints should be issued as a POST webhook call to your customer domain appended to the callback path (e.g. https://<customer>.datagrail.io/api/v1/data-request-callback).

The request payload should match one of the response types indicated below. This webhook should be triggered immediately when the processing of the request has been completed to avoid unnecessary polling of the callback trigger endpoint.

This endpoint should result in a request to the DataGrail webhook endpoint consisting of one of the three valid response types:

  • Results (Inline Data Response): for any result set under 10MB in size, inline results may be provided using the structure defined below.
  • Results Locations (Remote Data Response): Used for cases where response data may be larger than 10MB, this response type should link to files stored in your DataGrail PII storage location.
  • Deletion Response: No results are required. The only purpose is to let DataGrail know the status of the deletion request.

Inline Data Response Example:

{
"status": "completed",
"results_token": "<hexadecimal string>",
"results": {
"<connection-uuid>": [
{"first_name": "Guinevere", "last_name": "Pendragon"},
{"first_name": "Arthur", "last_name": "Pendragon"},
]
}
}

Remote Data Response Example:

{
"status": "completed",
"results_token": "\<hexadecimal string\>",
"results_locations": [
"internal-results/<request-uuid>/<results-token>/<connection-uuid>.log",
"internal-results/<request-uuid>/<results-token>/<connection-uuid>.log"
]
}

Results files should be written in json-log format using a modified form of the inline results structure. Each record should be written in a newline separated and json serialized string. These should be stored using UTF-8 encoding. For example:

{
"<connection-uuid>": {"first_name": "Jane", "last_name": "Doe"}
},
{
"<connection-uuid>": {"first_name": "John", "last_name": "Doe"}
}

Deletion Response Example:

{
"status": "completed",
"results_token": "<hexadecimal string>"
}

Request Parameters

Parameter NameDescriptionRequired
results_tokenHexadecimal string (length of 16) that will be used to retrieve the results of this request.Yes
callback_pathThe URL path to be used when triggering the results callback. To construct the full URL, the callback_path should be appended to your customer domain. The callback_path will lead with a forward slash.Yes

Expected Success Response

Status Code: 200 OK

{
"status": "completed"
}

 

Need help?
If you have any questions, please reach out to your dedicated CSM or contact us at support@datagrail.io.

Disclaimer: The information contained in this message does not constitute as legal advice. We would advise seeking professional counsel before acting on or interpreting any material.