Google Data Loss Prevention (DLP) is a set of automated functions that monitor data for triggers which are specific content defined by domain administrator, detect it and prevent it from being maliciously or accidentally leaked or lost. It is designed to help organisations better manage sensitive and personal identity data.
It provides fast, scalable classification and redaction for sensitive data elements like credit card numbers, names, social security numbers, US and selected international identifier numbers, phone numbers and GCP credentials. Cloud DLP classifies this data using more than 90 predefined detectors to identify patterns, formats and checksums and even understands contextual clues. You can optionally redact data as well, using techniques like masking, secure hashing, tokenisation, bucketing and format-preserving encryption.
- Redaction and suppression remove entire values from a dataset.
- Partial masking hides parts of the data, leaving some data visible.
- Tokenisation or secure hashing replaces sensitive data with a key.
- Dynamic data masking applies de-identification and masking techniques in real-time.
- Bucketing, K-anonymity and L-diversity help businesses understand and transform data.
How does DLP work?
DLP leans on Google’s extensive machine learning capabilities including image recognition and machine vision, natural language processing and context analysis to seek out overlooked or unexpected sensitive data and automatically redact it.
For example, in Gmail, Google DLP scans messages for the triggers and if detected, takes the action predefined by the administrator. Depending on the company policy and required prevention level, the G Suite administrator can set up the DLP policy for one or several types of messages:
- Emails received from outside the set of domain associated with the organisation.
- Emails sent outside the set of domain associated with the organisation.
- Emails received from within the set of domain associated with the organisation.
- Emails sent within the set of domain associated with the organisation.
There are three main types of triggers that can be set:
- Any specific expression – any words or phrases can be set up.
- Metadata attributes – such as the source IP, the item size, whether or not the message is authenticated, whether or not the connection is TLS encrypted.
- Predefined content match – the wide range of different countries and international detector patterns is available, such as CCN number, passport number, Social Security Number, IBAN, etc.
For these detectors, the system analyses not only the content of the data (9 digits of Social Security Number) but also the context (words like “ssn”, “social”, “social security”, “taxpayer”). When the system finds a message containing sensitive data, it takes one of the following actions depending on the administrator’s setup:
- Modifies a message – bypass spam filters, remove attachments, add more recipients or require secure transport.
- Rejects sending/receipt of a message.
- Quarantines message – quarantined messages will be sent to admin quarantine panel where admin can preview it and allow or deny.
The DLP API can be pointed to any data source or storage system. It offers native support and scalability for large datasets in Google’s Cloud Storage, Cloud Datastore and Enterprise Cloud Data Warehouse BigQuery. It runs not only in numerous Google products including all of G Suite but also offers an application programming interface that lets administrators use it outside of Google’s ecosystem. The Google Data Loss Prevention API is available for a free trial with production pricing based on data volume of content and storage inspection.