…helping protect sensitive data
1
What is Data Masking
Why Data Masking
DM Transformation - Informatica
Different Masking Rules
Key Masking
Random Masking
Inbuilt Masking Rules
Substitution Masking
Procedure followed – DM requirement in DICoE
Challenges faced in AASI Data Masking
2
Transformation of sensitive information into de-identified, realistic-looking data Data remains relevant and meaningful Preserves the original characteristics of data Preserves referential integrity
3
There are requirements in the Enterprise for production data in nonproduction environments for needs like Development Test Data Analysis and training Organizations take immense measures to secure private data in production environments. As a result the non-prod environments become an attractive target to the malicious users. There rises need to use the prod data in testing environments in a way such that the sensitive data is masked yet realistic. Informatica power center data masking option protects the sensitive information by masking it while maintaining the original nature of data and preserving the referential integrity. Pre-requisite for Data Masking transformation is Infa 8.5.1. In AMP DM server components are installed in Infa 8.6.1
4
Data Masking feature can be utilized by just adding a new transformation – Data Masking transformation in the mapping.
The DM transformation masks the source data based on the masking rules that we configure for each field which is identified to have sensitive data.
Masking rules can be configured to provide
Non-Deterministic Randomization Deterministic and Repeatable masking Blurring – adding variance value to the original data Substitute original data with false unrealistic data
5
Different Masking Rules Key Masking : produces deterministic data. Maintain referential integrity by the use of seed value. DM transformation requires seed value to return deterministic data. DM transformation creates default seed value and is also editable. Default seed value is a random number between 1 and 1,000. Key Masking types String Masking – Key masking for strings can be configured to generate repeatable outputs. We can specify the following for string key masking Mask Format – Different Mask formats are A,N,D,X,+,R Characters to be masked in the source string Replacement characters Numeric Masking – Field in the source file or table can be configured for numeric key masking to generate repeatable outputs. Date Masking – This masking rule can be used if a particular date column needs to be masked in such a way that it maintains referential integrity. 6
7
Random Masking : to generate non-deterministic data The Data Masking transformation returns different values when the same source value occurs in different rows. Random Masking Types Numeric Masking Rules that can be applied for numeric random masking Range – define range of the masked value Blurring – generate masked values that are within the fixed or percentage variance of source data. String Masking Similar rules as string key masking. In addition there will be option to specify the range of string length.
8
9
Date Random Masking Masking rules that can be applied Range - upper/lower bound of the masked date value The default date time format is MM/DD/YYYY HH24:MI:SS. Blur – mask date based on the variance applied to the unit of date. Blur unit can be year, month day or hour. Default is year. DM applies variance to the selected blur unit and for other units random numbers are substituted. For example, to restrict the masked date to a date within two years of the source date, select year as the unit. Enter two as the low and high bound.
10
Inbuilt Masking Rules Inbuilt masking rules that can be applied SSN Credit card URL / IP address Phone Email address Masking Social Security Number A list containing the valid SSN numbers will be stored in the infa server path \infa_shared\SrcFiles\highgroup.txt. The DM transformation access the highgroup.txt file and generates masked SSN that is not available in the list.
11
Substitution Masking: Substituting data with lookup transformation Apart from the different masking algorithms that are available, we can also substitute original data with unreal information from dictionary files. The default dictionary files will be available in the following path server\infa_shared\LkpFiles Example: FirstNames.dic This file will contain SNo column and FirstNames column. In the mapping we can generate a random number using the DM, give random number as input to lookup thereby lookup for the SNo and get the first name from the lookup file. Suppose the dic file has 100 names in it, while generating random numbers range can be specified and 1 to 100.
12
Identify sensitive fields Documenting DM requirement in proper format– ideally it should have table/file name , attribute/field name , DM required (Y/N) , PK/FK relation , Rule Type and Description If requirement has common fields to be masked across files/tables , creating mapplet with the “to be masked” fields would be helpful. Coding/Testing
13
Few challenges faced in AASI Data Masking
In the AASI DM requirement, the source and target were MF files. So to ensure that our DM mappings makes no impact to the fields which does not require masking , we had Only the “fields to be masked” in the Data Map and all others were declared as Filler with binary data type. Masking Format defined for attributes should be in sync with the actual character feed from source as in case of String Masking. For example, if the mask format is defined to have “alphabets – A” but value from source is having special characters (@,$ etc.), error will pop up - “Invalid input mask format” SSN Masking accepts only valid TAX_ID as input example: XXX-XX-XXXX. So if we are planning to use inbuilt SSN masking, we need to take call on whether to use SSN masking by transforming the input source value to proper format or simply use numeric/string key masking. Maintaining Data Quality – DM transformation produces masked output based on inbuilt algorithms, so even if null or 0 values are passed as input, DM generates a masked output value. But it may be that the downstream teams using the masked data may need to check for NULL or 0 values in the source. So we need to make sure that the data quality is maintained. As in the above case, we may have to apply a transformation to retain source value in case of 0 or NULL.
14
Thank You
15