System Detection
DataGrail's System Detection allows organizations to continuously scan identity providers and other systems of record to automate data mapping, helping to proactively identify shadow IT and third-party software/services. With over 37 system-detection integrations, DataGrail offers robust support to make your data mapping easy.
System Detection is one of three ways that services and software can be added to your inventory. In addition to automated system detection, services are automatically added via integrations you have connected. Lastly, you can manually add services to your inventory.
Supported Integrations
System Detection is driven by integrating systems of record, like SSO providers, expense tooling, and other business systems to DataGrail. These system-detection integrations securely extract objects and fields known to contain downstream systems on a daily cadence. This data is analyzed continuously using DataGrail's System Detection Model to surface shadow IT to your System Inventory.
To enable a supported integration for System Detection, simply enable the System Detection capability from the DataGrail integration page.
System-Detection Integrations
- Marketo
- Zendesk
- Salesforce
- Intercom
- Mixmax
- Dropbox
- Okta
- HubSpot
- Auth0
- Google Tag Manager
- Slack
- Fivetran
- Looker
- Duo
- ReadyCloud
- Stitch
- Gorgias
- JumpCloud
- OneLogin
- Segment Public
- Kustomer
- Mode
- ShipStation
- Xplenty
- Microsoft Entra ID
- Blueshift
- PingOne Enterprise
- Drata
- Amazon Web Services
- Expensify
- Crossbeam
- Unbounce
- Avalara
- Segment Config
- Microsoft Teams
- Google Apps
- Coupa
The System Detection Model
DataGrail uses a layered system detection approach to ensure accuracy, completeness, and results you can trust. The System Detection process runs daily to provide continuous detection as your IT landscape changes.
The goal of System Detection is to securely extract fields from connected integrations that may interact with or connect to third-party services and software. DataGrail's System Detection model then analyzes each field to surface a match, or a relationship between an extracted field and a known third-party service. Extracted fields vary dramatically across different integrations, and there may be many potential matches for a single field.
Example Fields That Can Detect Zoom Video Conferencing:
CreateZoomWebinarQueue
Zoom Web Conferencing
ZoomUserTokenRefresh
DataGrail's model prioritizes accuracy and seeks to surface only high-confidence matches to your inventory. Low-confidence matches are continuously reprocessed to ensure completeness as more data is extracted and DataGrail's model continues to improve.
Data Extraction
DataGrail will extract relevant objects and fields from connected integrations nightly. When the capability is enabled on a supported integration, the first run of the data extraction process will not occur until the next scheduled run the following day.
The objects and fields DataGrail extracts varies per integration, and the available fields and their complexity dictates the effectiveness of a particular integration. Identity Providers like Okta, for example, allow DataGrail to pull a list of applications directly, which results in a large volume of very high-confidence results. On the other hand, systems like Salesforce provide less structured data that requires more complex analysis.
When an integration is enabled for system detection, the first run will take the longest, since all relevant fields must be extracted and processed. On subsequent runs, DataGrail will only pull new or updated fields.
To understand what resources and endpoints DataGrail is accessing through System Detection, please reference the API Documentation for the particular integration. API Documentation is located at the bottom of the integration's Connection Instructions, which can be found on the Integrations Page within the DataGrail app.
Classification and Identification
The Classification and Identification process is run daily and follows data extraction. The goal of Classification and Identification is to take the extracted data, analyze it, and attempt to surface high-confidence matches between the raw data fields and known third-party systems.
Filtering Out Low-Confidence Matches
As a first step, known false positives and extremely low confidence fields/strings are filtered out of extracted data. The remaining data fields provide a concise set of potential matches for the next layer of review.
Low-Confidence Strings | High-Confidence Strings |
---|---|
ContactOwner | Lattice.com |
AccountName | JiraHelper |
Password Reset | SyncDataToNetSuite |
Identification of High-Confidence Services and Software
Some extracted fields are exact matches with DataGrail's internal catalog of services and software. These exact matches are identified and surfaced to your inventory immediately. Additionally, some Single Sign-On (SSO) integrations allow unique identifiers to be extracted, which can be used to perform lookups against known app catalogs. These matches are also identified immediately.
Machine Learning and Analysis of Complex Matches
The remaining extracted fields are generally more complex and may have many potential matches with third-party services and software. DataGrail uses its patented system detection algorithm and Machine Learning to identify and score potential matches, using available fields and context from connected integrations. If a potential match meets DataGrail's confidence threshold, it will be surfaced to your inventory.
Continuous Review
Remaining data fields that did not surface a match are reprocessed daily. As DataGrail's internal app catalog grows and the system detection algorithm becomes more capable and accurate, previously undetected fields may be able to surface a match.
DataGrail's Machine Learning Engineers also review System Detection results regularly to monitor the effectiveness of the algorithm. Any system detection algorithm may surface false-positives, and the human review of automated matches seeks to identify these false positives early, to correct them, and to improve the algorithm as a result.
Getting The Most Out of System Detection
While DataGrail's System Detection is largely an automated process, there are best practices for configuration and review to ensure your results are as productive as possible.
Connecting Integrations
When setting up System Detection for the first time, it is important to prioritize connecting Single Sign-On (SSO) Providers (i.e. Okta, Microsoft Entra ID, etc.) first. These integrations are always the most productive in surfacing connected systems.
SSO Providers generally hold the most complete and accurate list of third-party systems used by an organization. Data retrieved from an SSO provider can be easily matched to a known third-party service and the identified service is generally still in use. These SSO providers will surface the majority of your services and software.
For companies especially interested in detecting Shadow IT, we recommend connecting all additional system-detection integrations your organization utilizes. The data we retrieve from these integrations is generally a bit "noisier", since the intended purpose of these services is not strictly to store connected apps. Many of the fields we retrieve are not related to connected apps at all, and the ones that are generally contain additional characters/text that make matching more complex. As a result, it is more challenging to surface services and software from these integrations, and the services that are surfaced are more likely to be a false positive.
Understanding False Positives
A False Positive is a service or software identified by DataGrail in your inventory that is not actually used by your organization. A false positive can be surfaced for a variety of reasons.
When a new system is surfaced in your inventory, we always recommend checking (1) if the system exists and (2) if the system is still in use. If neither is true, the system is likely a false positive and can be removed from your inventory.
Ambiguous Data Labels
Retrieved System Detection data can contain generic strings that can incorrectly be matched to a third-party service or software.
For example, if DataGrail retrieves the field route_to_engineering_team
from your integrations, the string route
can be matched to the service route.com. However, in this example, route
is being used as a general term and not referring to a specific service or software.
The data retrieved from your integrations often lacks the context DataGrail needs to distinguish a generic term from a genuine third-party service. DataGrail's model is trained to ignore and gather additional context for these generic terms, but it is still possible to surface false positives in some cases.
Extracted Data For Deprecated Services and Software
If your organization is no longer using a third-party service, it's likely data from that service still remains in your connected integrations. As a result, DataGrail may identify systems that are no longer in use by your organization.
Using Company Systems to Authenticate Personal Apps
Services like Google Apps allow internal users to authenticate additional third-party apps using their work email address. For example, a member of your organization may use their company email to authenticate a personal app, like Uber. Since DataGrail's Google Apps Integration pulls the last 3 days of authentications in Google, Uber will be surfaced in DataGrail, despite it being authenticated only for personal use.
Establishing Good Practices
While System Detection is a powerful tool that can be used to identify most services and software used by your organization, the data available from your integrations doesn't tell the whole story. System Detection alone can't always answer whether a service is still in use, who it is used by, and what data it holds. This functionality can certainly be used to guide those decisions (especially when coupled with Responsible Data Discovery), but it's important to keep this in mind as you review your results.
When a new service or software is surfaced by DataGrail, it's important to check in with your internal teams to determine:
- If the service is still in use. If you determine a detected service is no longer in use, just remove it from your System Inventory. It will not be surfaced again.
- What type of data is held in that service. Understanding the type of data a service or software holds is a critical part of your organization's privacy program. DataGrail provides additional information on known third-party systems to help guide this process, but we always recommend checking with your internal teams.
- The Business Process that the service supports. Defining a Business Process for a detected service is helpful in categorizing your System Inventory, so you can associate each system with a function of your business.
Establishing a regular review process will allow you to maintain the most accurate list of your organization's services and software.
Frequently Asked Questions
Why didn't DataGrail identify a service that I know is in my network?
There are a few reasons why a service might not be detected in DataGrail:
- The service is not referenced in your connected integrations. It is possible a service is not referenced in or connected to your system detection integrations. In these cases, there is no data that can be used to identify this system.
- The service does not exist in DataGrail's network. DataGrail has a broad library of third-party services and software that can be surfaced from your data. If a service does not exist in DataGrail's library, it cannot be surfaced in your System Inventory. In these cases, we will automatically attempt to detect it from popular third-party app catalogues. However, in order to detect a service in a third-party app catalogue, the match confidence must be very high. Proprietary services generally don't exist in DataGrail's library or third-party app catalogues, so we recommend adding these manually
- There is not a high-enough confidence match to surface the service. DataGrail's System Detection Algorithm prioritizes accuracy. As a result, we will only surface high-confidence matches to limit false positives as much as possible. While there may be data in your connected integrations for a certain service, if the data does not meet the confidence threshold, it will not be detected. However, system detection data is reprocessed daily, so the confidence threshold may be met over time.
Why hasn't the "Last Detected Date" updated on one of my detected systems?
The "Last Detected Date" in your System Inventory represents when the most recent piece of data that matched a service was extracted. Given the nature of system detection integrations, it is not often a new piece of data is created that allows DataGrail to detect that service again. An older "Last Detected Date" does not imply a service is no longer in use, it just indicates that there is not any new data for that service in the integration that detected it.
Disclaimer: The information contained in this message does not constitute as legal advice. We would advise seeking professional counsel before acting on or interpreting any material.