This is the repository for the project Keep it Private: Unsupervised Privatization of Online Text
- The
requirements
will need to be installed.
$ scripts/setup.sh
This will create a conda environment named kip
. You will need to download the checkpoints
into the models
directory. The script will prompt you to run the correct commands.
Keep it Private performs authorship transfer by performing authorship transfer using a seq2seq model that was adversarially fine-tuned via reinforcement learning using a set of rewards (Privacy, Sense, and Soundness metrics)
$ conda activate kip
$ python src/generate.py --input_data_path ${INPUT_DATA_PATH} \
--output_path ${OUTPUT_PATH} \
--model_path models \
--model_name_to_use dipper-large \
--model_start_file ${BIN_FILE} \
--token_max_length 256 \
python src/generate.py --input_data_path {JSONFILE} \
--output_path {OUTPUT_FILE} \
--model_path models \
--model_name_to_use dipper-large \
--model_start_file models/dipper_v130.bin \
--token_max_length 256 \
input_data_path
: path to the query documents to be privatizedoutput_path
: file to save the privatized documentsmodel_start_file
: path to trained KiP modelmodel_name_to_use
: path to pre-trained base modeltoken_max_length
: max cutoff length of outputrandom_seed
: initialize all random seed to this value