SQL Server 2016: Masking sensitive data


Data Masking

At the time of writing this article, it’s been more than 2 months since the most secure and powerful SQL Server yet released for public. It has many features which you can use to make your data and application more secure, more maintained and easily retrieved as per your requirements. It has introduced both client side security such as Always Encrypted as well as server side security such as Data masking etc. In this post we’ll examine how Dynamic Data Masking can help secure our data.

Earlier we were masking the sensitive data by applying logic at application end or by replacing characters at SQL Server end and then returning the masked data to the application. In the former case the risk was at
1. When application fails to mask data
2. Sensitive data transferred over network
3. All the users in SQL Server with SELECT permission still can access the unmasked data

In the later case, the first two risks are eliminated but still there is a chance that the sensitive data can be revealed by the SQL Server user through which the application retrieves data.
1. When application fails to mask data
2. Sensitive data transferred over network
3. All the users in SQL Server with SELECT permission still can access the unmasked data

So, let’s quickly jump in to the next section to understand what Dynamic Data masking is and how it helps to eliminate all these chances of unmasked data getting revealed.

Data Masking

If you search for definition of Data Masking over the web, you’ll end up with varied descriptions. The simplest and straight forward definition can be –

“Data masking is the process of hiding original data with random characters or data.”

If we look further, there are various data masking techniques available such as

  • Shuffling: Shuffling order of the characters in the value.
    Ex: 12345 —–> 35312
  • Nulling: Substituting characters in the value with null (hash) symbol.
    Ex: 12345 —–> ###45
  • Substitution: Substituting value with another value from a substitution table.
    Ex: Suvendu Giri —–> John Ptak
  • Masking Out: Masks the complete data or a selective part of the data.
    Ex: suvendu@mydomain.com —–> suvendu@xxxx.xxx

Considering many techniques of data masking they can be categorized into 3 types-

  • Static Data Masking: Original data in production changed to masked data.
  • On-the-fly Data masking: Copies data from one source to another source and masking done on the later.
  • Dynamic data masking: Masking done dynamically at runtime.

Dynamic Data Masking

“Dynamic data masking limits sensitive data exposure by masking it to non-privileged users. Dynamic data masking helps prevent unauthorized access to sensitive data by enabling customers to designate how much of the sensitive data to reveal with minimal impact on the application layer. It’s a data protection feature that hides the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed.”

So, dynamic data masking alters the result for non-privileged users while streaming and not with data in the production database.

Data Masking in MySQL:

As far I know, till the recent version of MySQL there is no inbuilt mechanism to provide dynamic data masking capabilities. The possible options for MySQL users seems to be

  1. Use 3rd party tools
  2. Use custom scripts/functions


SELECT '****' AS PhoneNumber
FROM `Employees`
--Input: 1234567890        Output: ****
SELECT REPEAT('*', CHAR_LENGTH(PhoneNumber) - 6) AS PhoneNumber
FROM `Employees`
--Input: 1234567890                        Output: **********
SELECT CONCAT(SUBSTR(PhoneNumber, 1, 4), REPEAT('*', CHAR_LENGTH(PhoneNumber) - 4)) AS PhoneNumber
FROM `Employees`
--Input: 1234567890                        Output: 1234******

Data Masking in PostgreSQL:

Same as MySQL. It seems they are still lacking this feature. However, we can make use of some custom scripts or user defined functions.

Azure SQL Database too supports the dynamic data masking similar to SQL Server and I have heard from my friends that Oracle provides some kind of mechanism for data masking, although I haven’t ever used it.

Dynamic Data Masking in SQL Server 2016:

There are four masking functions available in SQL Server 2016 to provide separate ways to mask your data when they are returned by a query. These are-

[Read complete article at this link (CodeProject.com) ]