Have you ever wondered how many people see, or have reason, to access your enterprise’s most sensitive data while doing their jobs? How many data scientists, software developers, customer support reps, marketers, and salespeople might see customer data that could be subject to data privacy regulations such as the GDPR or CCPA? And, how many enterprise software applications need to use privacy-protected data in the course of a single business day?
In an era where the average cost of data privacy breaches runs over $4 million, and where some infamous data breaches have cost businesses far more in reputation and intensified regulatory scrutiny, enterprises increasingly have turned to data masking to protect one of their most valuable assets.
What is Data Masking?
Data masking, sometimes called data obfuscation, is a technique for modifying data that allows authorized people or applications to use customer data while preventing or limiting its exposure or use by unauthorized people or applications. In some cases, the unauthorized users might be malicious hackers or intruders.
But in many cases, particularly in data-driven companies, the unauthorized users could be enterprise application developers, data scientists and testers. Or, they could be contact center and customer service personnel who need to see some types of data, but aren’t authorized to see all of it. A contact center worker, for instance, might be authorized to see a user’s history of complaints and resolutions, but not the customer’s financial or health information.
Data Masking In Practice
In a typical data masking use case, a marketer, data scientist or software developer might want to use a particular customer database to develop a customer loyalty application. But evidence has shown that in a majority of cases, knowing someone’s exact zip code, birthdate, and sex is enough to uniquely identify them. Data masking can be used to make a copy of the original data which still contains information about, say, the general area of each customer’s address—by using the first three digits of a zip code instead of the full zip—for use in applications for analysis and testing.
In another use case for data masking, people whose work involves testing applications that require credit card numbers might have access to a copy of the original master database for “sandbox testing” or analysis. But in the interest of data security, the data the testers and developers can access might have fake (synthetic) credit card numbers or be altered in some other way to obscure the real, sensitive information. In this case, if fake credit card numbers were used, the sandbox data would still meet the requirements for credit card numbers, such as real first four numbers (representing the financial institution) or first-eight numbers, and they’d still have a valid checksum.
In this case, none of the credit card numbers in the modified dataset would actually be a usable credit card account number. The advantage to data masking In this use case is that customer privacy and data security are given additional protection, while the data scientists, app developers, and testers get what they need. Marketers can also still get insights into the dataset without compromising privacy.
Challenges with Data Masking
A potential concern with data masking is that the real-world production data might change after masked sandbox data has been copied or otherwise masked. If it changes in major ways, testers, developers, and even marketers who are looking at analytics based on the masked data might miss important insights or behaviors of the underlying system. Or, production systems that are based on masked (or synthetic data) might not handle real data as well as it did in test systems, either because there is an undetected difference between the actual data and the masked data, or because over time, the real data changes in significant ways, while the test data does not.
Static Data Masking
These risks, which include missed insights or unflagged incorrect behavior during testing, can be high in the case of static data masking (SDM). SDM is when sensitive data is permanently replaced by altering data at rest. In such a case, developers and marketers could be working on a dataset that no longer reflects the real world in an important way.
Dynamic Data Masking
Dynamic data masking, on the other hand, replaces sensitive data in transit, leaving the original at-rest data unchanged, and so is less likely to suffer problems of model drift or data drift. But if data is rapidly changing, there can still be a risk of divergence or missed insights and opportunities.
On-the-Fly Data Masking
On-the-fly is a type of data masking that uses the extract-transform-load (ETL) method to transform sensitive data from one data source or environment, mask it, and send to another data source/environment so that the resulting masked data can be shared or used.The original data remains unmasked, while the resulting masked data is used in the testing or development environment, or in other applications that require masked data. In contrast, dynamic data masking happens when programs are running and is performed on-demand as needed. However, in dynamic data masking, the original, complete data set is unaffected and stored unmasked.
Still, data masking is an important tool that helps enterprises get the most out of their data while still offering important customer data protection on their sensitive information.
Good Customer Data Platforms and Other Applications Let Enterprises Mask Data
The need for data masking has evolved in recent years, and what started out as a technique used mostly internally by software developers, data scientists, and software testers has become widespread. The total value of the data masking market is projected to reach $767 million by 2022, at a Compound Annual Growth Rate (CAGR) of 14.8 percent. Many observers attribute this growth to increasing privacy protection concerns and regulations, as well as rapidly expanding volumes of customer data in internal and cloud environments that must be managed and secured.
Many companies now offer data masking capabilities, either in standalone privacy protection apps, or as part of a larger product such as a customer data platform (CDP). Here is an example of several data masking functions for a CDP where they handle encryption, decryption, and hashing.