Masking Columns in Azure SQL and SQL Server: Safeguarding Sensitive Data

Information security and privacy are key in today’s data-driven world. Sensitive data needs to be protected from unauthorized access. With column masking in SQL Server and Azure SQL, you can conceal sensitive data, allowing only authorized users to see it.

Understanding Column Masking

Column masking is also known as data masking or dynamic data masking. This allows you to control data visibility at the column level. It is beneficial in scenarios where certain users or applications require data access but cannot view the actual values.

Benefits of Column Masking

  1. Enhanced Security: By masking sensitive data, organizations can prevent unauthorized users from accessing critical information. This helps to minimize data breaches and unauthorized data exposure.
  2. Compliance with Privacy Regulations: Column masking helps meet regulatory requirements, such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA). Limiting access to sensitive data shows organizations’ commitment to privacy.
  3. Minimal Application Changes: Database masking reduces the need to modify applications to protect data.

Implementing Column Masking in SQL Server

Let’s explore two methods: static and dynamic.

Static Data Masking

Static data masking involves modifying the actual data stored in the database. It replaces sensitive data with fictitious or masked values. This method is typically used in non-production environments. This ensures data security while maintaining realistic data distributions for testing or development purposes.

Here’s an example of static data masking:

-- Creating a table with sensitive data 
CREATE TABLE Customers 
(CustomerID INT PRIMARY KEY, 
Name VARCHAR(100), 
Email VARCHAR(100), 
CreditCardNumber VARCHAR(16)); 

-- Masking the CreditCardNumber column with 'XXXX-XXXX-XXXX-####' 
UPDATE Customers SET CreditCardNumber = 'XXXX-XXXX-XXXX-####';

In this example, the masked format replaces the actual credit card number. This way sensitive information is not exposed.

Dynamic Data Masking

Dynamic data masking is another technique to control data visibility. It allows you to define masking rules for specific columns, ensuring sensitive data is only displayed to authorized users. Unlike static masking, it masks the data at runtime, without modifying the actual values stored in the database.

SQL Server has four dynamic data masking options to mask sensitive data. These options allow you to define masking rules on columns, dynamically applying them as the data is queried based on the user’s permissions. Dynamic data masking options include:

  1. Default: The default masking option replaces sensitive data with a predefined masking function. The masking function determines how data is obfuscated. For example, you can use the “partial” masking function to show only a portion of the data while hiding the rest. The default masking option is useful when you want a consistent masking approach for all users.
  2. Random: The random masking option replaces sensitive data with randomly generated values of the same data type and length. This option is useful when you need to hide the actual values while maintaining the original data format and characteristics. For example, a credit card number could be masked as a different valid credit card number.
  3. Email: The email masking option masks email addresses by replacing a portion of the address with asterisks, while preserving the domain and top-level domain. This option is helpful when you want to hide personal information in an email address while keeping it recognizable and functional.
  4. Custom String: The custom string masking option allows you to define a custom masking function to replace sensitive data with a specific string. This option gives you greater flexibility in defining masking behavior according to your specific requirements. For example, you could replace a social security number with a fixed string such as “XXX-XX-XXXX”.

It’s important to remember that dynamic data masking is not foolproof. Use masking in conjunction with other security measures such as access controls, encryption, and proper data governance practices to ensure comprehensive data protection.

Dynamic Data Masking In Action

Here’s an example of dynamic data masking:

-- Creating a table with sensitive data
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Name VARCHAR(100),
    Email VARCHAR(100),
    CreditCardNumber VARCHAR(16) 
    MASKED WITH (FUNCTION = 'partial(0,"XXXX-XXXX-XXXX-",4)') NULL
);

In this example, the CreditCardNumber column is masked using the partial() function. It replaces the first 12 digits of the credit card number with ‘X’ characters, displaying only the last 4 digits for authorized users.

To implement dynamic data masking on an existing table, you can use the ALTER TABLE statement with the MASKED WITH clause. This will enable you to specify the masking function and any additional parameters required. Here’s an example of how to alter a table column to include dynamic masking:

ALTER TABLE dbo.MyTable
ALTER COLUMN SocialSecurityNumber ADD MASKED WITH (FUNCTION = 'default');

Matching Masking Across Environments

I needed to query one environment to match its masked columns to another environment. To do this for yourself, you can use system tables/views to find the info you need. I created the following query to get all the information. It creates a script to generate the masked column.

WITH maskedcols (DatabaseName, SchemaName, TableName, ColumnName, DATA_TYPE, ColumnLength, isMasked, MaskingFunction)
AS (SELECT  /* this works for string types */ 
	DB_NAME() as DatabaseName, 
	s.name 'SchemaName',
	tbl.name 'TableName', 
	c.name 'ColumnName', 
	DATA_TYPE,
	CASE
		WHEN DATA_TYPE = 'varchar' THEN col.max_length
		WHEN DATA_TYPE = 'nvarchar' THEN col.max_length/2
		ELSE 0
	END AS ColumnLength, 
	c.is_masked 'IsMasked', 
	c.masking_function 'MaskingFunction'
FROM sys.masked_columns AS c
JOIN sys.tables AS tbl 
ON c.[object_id] = tbl.[object_id]
JOIN INFORMATION_SCHEMA.COLUMNS AS i
ON i.TABLE_NAME = tbl.name and i.COLUMN_NAME = c.name
JOIN sys.columns col 
on tbl.object_id = col.object_id and i.COLUMN_NAME = col.name
JOIN sys.schemas as s 
ON tbl.schema_id = s.schema_id
WHERE c.is_masked = 1)

SELECT schemaname, tablename, columnname, maskingfunction, data_type, columnlength,
'ALTER TABLE ' + schemaname + '.' + TableName as tblAlterScript,
'ALTER COLUMN ' + ColumnName + ' ADD MASKED WITH (FUNCTION = '  + '''' + MaskingFunction COLLATE SQL_Latin1_General_CP1_CI_AS + ''');'  as colAlterScript					
FROM maskedcols
ORDER BY TableName, ColumnName;  

If you have masked columns already, the previous query will give you results like the following screenshot. You can then use the script to modify your tables to match the masking.

For more code examples, visit this Microsoft link.

Granting Unmask

Granting UNMASK permission in SQL Server allows a user or role to view the original values of masked columns. By default, masked columns are only accessible to authorized users in their hidden form. However, in certain scenarios, it may be necessary to grant the UNMASK permission to specific users or roles to enable them to see the unmasked data. Carefully grant permission to only those individuals who require access to the original values for legitimate purposes.

My first step was to create a role and grant it unmask then grant that role to any groups/users that might need it, as shown in the following example:

CREATE ROLE db_unmask;

GRANT UNMASK to db_unmask;

EXEC sp_addrolemember 'db_unmask', 'dbgroupuser'; 

Conclusion

Organizations need to protect sensitive data, and SQL Server’s column masking lets you do that while maintaining data integrity. By implementing column masking, organizations can enhance security, comply with privacy regulations, and minimize unauthorized data exposure.

This blog post explored column masking in SQL Server and its benefits. We discussed two common methods: static and dynamic data masking. Static data masking involves replacing sensitive data with fictitious or masked values. Dynamic data masking allows you to define masking rules for specific columns at runtime.

Both methods allow organizations to safeguard sensitive data and limit its visibility to authorized users. By incorporating column masking into their data security strategy, organizations can bolster their defenses against data breaches and demonstrate their commitment to privacy regulations.

Remember, use column masking in conjunction with other security measures such as authentication, encryption, and access controls to establish a comprehensive data protection framework.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.