It is a CLI proxyless administration tool to mask sensitive information in a data warehouse. Unlike commercial solutions, RedMask is proxyless. You only need to run RedMask when you define the rules and access control. After that, you can bring down RedMask.
Currently, RedMask supports Postgres, Redshift and Snowflake. It is an open source tool, released under Apache 2.0 license, and is free to use.
This tool supports two type of data masking - static and dynamic masking:
In static masking, a new table is created with the selected columns masked. This increases storage costs. This technique is suitable for data sets that do not change often.
In dynamic masking, RedMask creates a view that masks the desired columns. The view has the same name and columns as the underlying table, but is in a different schema. When a query is executed, the data warehouse picks either the table or the view depending on the search path/default schema.
Unlike commercial solutions, RedMask is proxyless. You only need to run RedMask when you define the rules and access control. After that, you can bring down RedMask.
RedMask does not intercept your queries or the data. Instead, it relies on standard database techniques to create alternative views with masked data. You can create these views manually as well, RedMask just makes it easier by providing an intuitive user interface.
RedMask requires a database to store temporary data. [TODO] RedMask will use a sqlite database to store this data, but you can configure it to use another database. If you lose this database, RedMask can rebuild it by reading the data warehouse.
Applications connecting to the data warehouse have to make 2 small changes:
- Queries accessing tables with sensitive data must refer to tables without the schema name.
Wrong:
select * from sales.customers
Correct:select * from customers
This restriction does not apply to tables that do not have sensitive data. - Optionally, you need to run an initial sql statement when you create a connection to the data warehouse. This initial sql statement sets the default schema / search path to resolve table names without a schema identifier.
- With static masking, a copy of the table is created with the desired columns masked. This increases the storage requirements.
- With dynamic masking, a view wraps the underlying table. This approach has a negligible impact on performance, unless the masked column is part of a join condition. In general, we recommend joining on ids that do not require to be masked. If you cannot join on an id field, there are some ways to reduce the performance impact of masking.
Clone the repository to your local machine, install gradle for your respective operating system, and add the following dependencies.
Linux/Ubuntu
sudo apt-get install openjdk-8-jdk gcc zlib1g-dev
gradle clean nativeImage
Windows Install the Microsoft Windows SDK for Windows 7 and .NET Framework 4 as well as the C compilers from KB2519277.
Using the command Prompt run the following to build redamsk
gradle clean NativeImage
Mac
gradle clean nativeImage
- Construct a json masking configuration file. Within a masking configuration, the administrator can choose specific columns that need to be masked. For each column that needs to be masked, it will allow the administrator to configure the masking rules like
STRING_MASKING
,Random_Integer_Within_Range
etc.
{
"host": "localhost",
"port": <port_number>,
"user": <dev_user_name>,
"database": <database_name>,
"dbType": <database-type>,
"rules": [
{
"table": <table_name>,
"columns": [
{
"name": <column_name>,
"mask_type": <masking_type>,
"mask_params": {
<Key>: <value>,
<Key>: <value>,
<Key>: <value>
}
},
{
"name": <column_name>,
"mask_type": <masking_type>,
"mask_params": {
<Key>: <value>,
<Key>: <value>,
<Key>: <value>
}
}
]
}
]
}
- Set the following parameters as system environment variables.
DB_SUPER_USER = <super_username>
DB_SUPER_USER_PASSWORD = <super_user_password>
DB_USER = <dev_user_name>
DB_USER_PASSWORD = <dev_user_password>
- Administrator will launch RedMask via CLI using below command
./redmask -f=/<path_to_josn_file>/masking_config.jso -r=false -m=dynamic
where:
-f
or--configFilePath
is complete file path of json containing masking configurations.-r
or--dryRun
when true, this will just generates sql file with required queries. It will not make any changes to DB. It indicates dry run mode.-m
or--mode
is the masking method to be used i.e. static or dynamic
After running the command, the RedMask will create appropriate schemas/views/permissions in the underlying data warehouse.
The Snowflake data warehouse supports role based access control, therefore the role name is to be provided under the user field in the configuration file as follows.
{
"host": "localhost",
"port": <port_number>,
"user": <dev_role>,
"database": <database_name>,
"dbType": <database-type>,
"rules": [
{
"table": <table_name>,
"columns": [
{
"name": <column_name>,
"mask_type": <masking_type>,
"mask_params": {
<Key>: <value>,
<Key>: <value>,
<Key>: <value>
}
}
]
}
]
}
All masking functions are required to have the following json configuration
{
"name": <column_name>,
"mask_type": <masking_type>,
"mask_params": {
<Key>: <value>,
<Key>: <value>,
<Key>: <value>
}
}
Mandatory Arguments
- name: The column name to be masked with the mentioned masking type
- mask_type: The mask to be used to mask the data
- mask_params: JSON for additional parameter. Leave Empty to use default parameters
The following types are supported on all supported data warehouses.
It masks the string column with a user-defined pattern. Additionally you can mention the number of letter to be left unmasked from the start and the end of the string.
Supported datatypes: text,varchar, character varying
Result datatype: text
Additional Paramters
- pattern: The pattern by which you would want to mask the text column. Default value is *
- show_first: The number of characters you don’t want to mask from the beginning. The default value is 0.
- show_last: The number of characters you don’t want to mask from the end. The default value is 0.
It masks a integer column by replacing the original value with a random generated value between the specified min and max values.
Supported datatypes: int,integer, short
Result datatype: int
Additional Paramters
- min: The minimum allowed value for the random integer. Default value is 0.
- max: The maximum allowed value for the random integer. Default value is 10.
It masks a integer column by replacing the original value with a random generated value having digits equal to the specified size.
Supported datatypes: int,integer, bigint,short
Result datatype: Bigint
Additional Paramters
- size: An integer type value representing the number of digits there should be in the randomly generated number. The default value is 2.
It masks an integer type column by the fixed number passed as the value parameter.
Supported datatypes: int,integer, short
Result datatype: integer
Additional Paramters
- value: The integer type value the column is to be replaced by. The default value is 0.
It masks a float type column by the fixed number passed as the value parameter.
Supported datatypes: float Result datatype: float
Additional Paramters
- value: The integer type value the column is to be replaced by. The default value is 0.00.
It masks the username part of the Email ID, while displaying the domain part.
Supported datatypes: text,varchar, character varying
Result datatype: text
It masks the all alphanumeric characters in the Email ID.
Supported datatypes: text,varchar, character varying
Result datatype: text
It masks the username part of the Email ID, while displaying the first character and the domain.
Supported datatypes: text,varchar, character varying
Result datatype: text
It masks entire email except the starting N characters specified in the show_first argument.
Supported datatypes: text,varchar, character varying
Result datatype: text
Additional Paramters
- show_first: The number of characters you want to display. default value is 0.
It mask column containing credit/debit card data, while showing the first few characters as passed in the show_first argument.
Supported datatypes: text,varchar, character varying
Result datatype: text
Additional Paramters
- separator: String/character used to separate the card number into group of digits.
- show_first: The number of digits to be left unmasked from the start.
It mask column containing credit/debit card data, while showing the last few characters as passed in the show_last argument.
Supported datatypes: text,varchar, character varying
Result datatype: text
Additional Paramters
- separator: String/character used to separate the card number into group of digits.
- show_last: The number of digits to be left unmasked from the last.
It mask column containing credit/debit card data, while showing the first and last few characters as passed in the show_first and show_last argument.
Supported datatypes: text,varchar, character varying
Result datatype: text
Additional Paramters
- separator: String/character used to separate the card number into group of digits.
- show_first: The number of digits to be left unmasked from the start.
- show_last: The number of digits to be left unmasked from the last.
The following types are supported only on the Postgres
It mask the integer column and convert into a int4range type with a range equal to the step parameter.
Supported datatypes: int, integer
Result datatype: int4range
Additional Paramters
- step: The integer type value denoting the range width. the default value is 10.
It mask the integer column and convert into a int8range type with a range equal to the step parameter.
Supported datatypes: int, integer
Result datatype: int8range
Additional Paramters
- step: The integer type value denoting the range width. the default value is 10.
It mask the integer column and convert into a numrange type with a range equal to the step parameter.
Supported datatypes: int, integer
Result datatype: numrange
Additional Paramters
- step: The integer type value denoting the range width. the default value is 10.
- Add document for masking various functions.
- Troubleshooting
- Provide some exmaple for masked data.