Install

Manual

  1. Install clickhouse (check guide)
  2. Install postgres (pg_dump and psql required)
  3. Clone this repository
  4. Install ruby gems
    cd pg_obfuscator/
    bundle install
    
  5. use

Docker

  1. Build images: run following command in current directory:
    docker build -t pg_obfuscator .
    
  2. Or just use pre-built: https://hub.docker.com/r/foobarengineering/pg_obfuscator

Usage

Warning! All commands must be executed from the root directory of the application. Application will create directories and files:

  1. config/ configuration for columns
  2. tables/ with CSV tables data
  3. ./pre-data.sql and ./post-data.sql - sections of schema
> bundle exec ruby pg_obfuscator.rb --help

Usage: ruby pg_obfuscator.rb [--configure --export-schema --export-tables --obfuscate --import] [options]
        --configure                       Generate config based on Data from CSV
        --export-tables                   Download data from each table defined in config file
        --export-schema                   Download schema from database
        --obfuscate                       Obfuscate data from input directory files and save to output directory
        --import                          Import data from CSV files to database
        --source-db-host <ip>             Database host for config generation
        --source-db-port <port>           Database port for config generation
        --source-db-name <name>           Database name for config generation
        --source-db-user <username>       Database user for config generation
        --source-db-password <password>   Database password for config generation
        --target-db-host <ip>             Database host for import data
        --target-db-port <port>           Database port for import data
        --target-db-name <name>           Database name for import data
        --target-db-user <username>       Database user for import data
        --target-db-password <password>   Database password for import data

Steps

1. Generate config

The configuration file contains the data types of all columns of each table, as well as a list of columns excluded from obfuscation.

File will be saved in config/config.yml Never touch this file. If you want to override configuration create config/custom_config.yml

> bundle exec ruby pg_obfuscator.rb --configure \
  --source-db-host 0.0.0.0 \
  --source-db-port 5432 \
  --source-db-name postgres \
  --source-db-user postgres \
  --source-db-password password

Config structure (dummy autogenerated example):

tables:                             # Contains instructions for obfuscator (how to process fields)
  users:                            # table name
    columns:                        # columns section
      id:                           # column name
        db_data_type: int8          # data type in database
        obfuscator_data_type: Int8  # data type for obfuscation (clickhouse-obfuscator)
        not_null: true
      name:
        db_data_type: varchar
        obfuscator_data_type: Nullable(String)
        not_null: false             # If field can not have null value wrap obfuscator_data_type by Nullable(<type>)
    excluded_columns:               # List of columns excluded from obfuscation
    - id
main:                               # global settings section
  tables:
    # list of excluded tables from obfuscation. 
    # If you want to load raw data, just add table in list
    skip_obfuscation:
    - schema_migrations
    # Also you can skip loading tables. 
    # Tables from list will not loaded
    skip_loading: []

After starting the configuration generation, you will see a list of columns that need to be fixed:

1. Fields are automatically excluded from obfuscation:

  • primary key
  • foreign keys, but the realated field is the primary key

2. Fields requiring manual processing configuration:

  • has a unique index
  • has an unknown data type
  • has an index on multiple fields

3. Possible configuration solutions: Use config/custom_config.yml

  • Add field to obfuscation exclusion list (If field does not contains sensitive data)
  • Add fake_data section for customize processing and add to exclusion list

2. Export database schema

Files will be saved in main directory (pre-data.sql and post-data.sql)

> ruby pg_obfuscator.rb --export-schema \
--source-db-host 0.0.0.0 \
--source-db-port 5432 \
--source-db-name postgres \
--source-db-user postgres \
--source-db-password password

3. Export database tables

Warning! May take a long time

If you have unconfigured fields, you will not be able to complete this step!

Running:

> ruby pg_obfuscator.rb --export-tables
--source-db-host 0.0.0.0
--source-db-port 5432
--source-db-name postgres
--source-db-user postgres
--source-db-password password

Files will be saved to tables/ tables/excluded/ - excluded from obfuscation data tables/source/ - data prepared for obfuscation

4. Obfuscation

Warning! May take a long time It will run clickhouse-obfuscator based on configuration from config/config.yml and data tables/source/. Excluded from obfuscation data will be combined with obfuscated. Each excluded row will be processed by faker (if it configured)

> bundle exec ruby pg_obfuscator.rb --obfuscate

5. Import obfuscated data to target database

Warning! Operation can not be undone!

> bundle exec ruby pg_obfuscator.rb --import \
--target-db-host 0.0.0.0 \
--target-db-port 25432 \
--target-db-name postgres \
--target-db-user postgres \
--target-db-password password

6. One command execution (All in one)

> bundle exec ruby pg_obfuscator.rb --configure --export-schema --export-tables --obfuscate \
--source-db-host 0.0.0.0 \
--source-db-port 5432 \
--source-db-name postgres \
--source-db-user postgres \
--source-db-password password \
--import-tables \
--target-db-host 0.0.0.0 \
--target-db-port 25432 \
--target-db-name postgres \
--target-db-user postgres \
--target-db-password password

Custom rules to process columns

  1. When you need to customize how fields are handled, you can use faker to do this. Add the following fields to the column section. During obfuscation, this rule will call the faker method and write returned value.
tables:
  users:
    columns:
      name:
        # Added settings
        fake_data:
          type: method
          value: "Faker::Internet.email"
  1. Also you can specify a value from adjacent fields to generate a value
tables:
  users:
    columns:
      id:
        db_data_type: int8
        obfuscator_data_type: Int8
        not_null: true
      name:
        # Added settings
        fake_data:
          type: pattern
          value: "%{id}@fakeemail.com"
  1. Precise value
tables:
  users:
    columns:
      enabled:
        # Added settings
        fake_data:
          type: precise
          value: true