- Install clickhouse (check guide)
- Install postgres (pg_dump and psql required)
- Clone this repository
- Install ruby gems
cd pg_obfuscator/ bundle install
- use
- Build images: run following command in current directory:
docker build -t pg_obfuscator .
- Or just use pre-built: https://hub.docker.com/r/foobarengineering/pg_obfuscator
Warning! All commands must be executed from the root directory of the application. Application will create directories and files:
config/
configuration for columnstables/
with CSV tables data./pre-data.sql
and./post-data.sql
- sections of schema
> bundle exec ruby pg_obfuscator.rb --help
Usage: ruby pg_obfuscator.rb [--configure --export-schema --export-tables --obfuscate --import] [options]
--configure Generate config based on Data from CSV
--export-tables Download data from each table defined in config file
--export-schema Download schema from database
--obfuscate Obfuscate data from input directory files and save to output directory
--import Import data from CSV files to database
--source-db-host <ip> Database host for config generation
--source-db-port <port> Database port for config generation
--source-db-name <name> Database name for config generation
--source-db-user <username> Database user for config generation
--source-db-password <password> Database password for config generation
--target-db-host <ip> Database host for import data
--target-db-port <port> Database port for import data
--target-db-name <name> Database name for import data
--target-db-user <username> Database user for import data
--target-db-password <password> Database password for import data
The configuration file contains the data types of all columns of each table, as well as a list of columns excluded from obfuscation.
File will be saved in config/config.yml
Never touch this file. If you want to override configuration create config/custom_config.yml
> bundle exec ruby pg_obfuscator.rb --configure \
--source-db-host 0.0.0.0 \
--source-db-port 5432 \
--source-db-name postgres \
--source-db-user postgres \
--source-db-password password
Config structure (dummy autogenerated example):
tables: # Contains instructions for obfuscator (how to process fields)
users: # table name
columns: # columns section
id: # column name
db_data_type: int8 # data type in database
obfuscator_data_type: Int8 # data type for obfuscation (clickhouse-obfuscator)
not_null: true
name:
db_data_type: varchar
obfuscator_data_type: Nullable(String)
not_null: false # If field can not have null value wrap obfuscator_data_type by Nullable(<type>)
excluded_columns: # List of columns excluded from obfuscation
- id
main: # global settings section
tables:
# list of excluded tables from obfuscation.
# If you want to load raw data, just add table in list
skip_obfuscation:
- schema_migrations
# Also you can skip loading tables.
# Tables from list will not loaded
skip_loading: []
After starting the configuration generation, you will see a list of columns that need to be fixed:
1. Fields are automatically excluded from obfuscation:
- primary key
- foreign keys, but the realated field is the primary key
2. Fields requiring manual processing configuration:
- has a unique index
- has an unknown data type
- has an index on multiple fields
3. Possible configuration solutions:
Use config/custom_config.yml
- Add field to obfuscation exclusion list (If field does not contains sensitive data)
- Add fake_data section for customize processing and add to exclusion list
Files will be saved in main directory (pre-data.sql
and post-data.sql
)
> ruby pg_obfuscator.rb --export-schema \
--source-db-host 0.0.0.0 \
--source-db-port 5432 \
--source-db-name postgres \
--source-db-user postgres \
--source-db-password password
Warning! May take a long time
If you have unconfigured fields, you will not be able to complete this step!
Running:
> ruby pg_obfuscator.rb --export-tables
--source-db-host 0.0.0.0
--source-db-port 5432
--source-db-name postgres
--source-db-user postgres
--source-db-password password
Files will be saved to tables/
tables/excluded/
- excluded from obfuscation data
tables/source/
- data prepared for obfuscation
Warning! May take a long time
It will run clickhouse-obfuscator based on configuration from config/config.yml
and data tables/source/
.
Excluded from obfuscation data will be combined with obfuscated.
Each excluded row will be processed by faker (if it configured)
> bundle exec ruby pg_obfuscator.rb --obfuscate
Warning! Operation can not be undone!
> bundle exec ruby pg_obfuscator.rb --import \
--target-db-host 0.0.0.0 \
--target-db-port 25432 \
--target-db-name postgres \
--target-db-user postgres \
--target-db-password password
> bundle exec ruby pg_obfuscator.rb --configure --export-schema --export-tables --obfuscate \
--source-db-host 0.0.0.0 \
--source-db-port 5432 \
--source-db-name postgres \
--source-db-user postgres \
--source-db-password password \
--import-tables \
--target-db-host 0.0.0.0 \
--target-db-port 25432 \
--target-db-name postgres \
--target-db-user postgres \
--target-db-password password
- When you need to customize how fields are handled, you can use faker to do this. Add the following fields to the column section. During obfuscation, this rule will call the faker method and write returned value.
tables:
users:
columns:
name:
# Added settings
fake_data:
type: method
value: "Faker::Internet.email"
- Also you can specify a value from adjacent fields to generate a value
tables:
users:
columns:
id:
db_data_type: int8
obfuscator_data_type: Int8
not_null: true
name:
# Added settings
fake_data:
type: pattern
value: "%{id}@fakeemail.com"
- Precise value
tables:
users:
columns:
enabled:
# Added settings
fake_data:
type: precise
value: true