mongo-copy-pseudonymize
Copy data from one MongoDB database to another with pseudonymization
The usual setup is to have a local MongoDB database (see official documentation for installation instructions, and Compass for visual inspection).
The script reads the connection parameters (source
and destination
database connections) from a .json
file, and then copies the entries from all collections. Additionally, it performs pseudonymization to hide sensitive information. The configuration is done in the .json
file. Pseudonymization in the current version simply replaces selected fields with strings mapping 1-to-1.
- Install Miniconda
- Create the environment:
conda env create -f environment.yml
- Activate the new environment:
conda activate mongo-copy-pseudonymize
- Edit your config file
config.json
(seeconfig.json.template
). Thesource_server
is the MongoDB connection string for the original database, andsource_database
is the database name.destination_server
anddestination_database
have the same format and define the target database. Please note that the--drop
flag will erase the destination database!pseudonymize
is a dictionary mapping colections to lists of fields to pseudonymize. - Run
python mongo_pseudonymize/__main__.py --config config.json
. The--drop
argument removes the target database.
Installation
pip install -e .
python -m mongo_pseudonymize