[link to paper]
This repo presents a method for solving word-based substitution ciphers with a neural language model.
bookcipher
- contains source codedata
- transcribed real wilkinson data + synthetic data fileslanguage-models
- instructions for fine-tuning language modelslattices
- test filesscripts
- batch run scriptstools
- tools, for now, just the scraper for wilkinson textwordbanks
- contains wordbanked letters, i.e. the training set
pip install -r requirements.txt
# One of the following to install pattern
# brew install mysql # OSX
# sudo apt-get install libmysqlclient-dev # ubuntu
pip3 install pattern
# no self-learning, fastest
python bookcipher/runtime.py --lm gpt2 -b 4 --source_file data/eval/ciphertext.txt --gold_file data/eval/plaintext.txt
# with self-learning, more accurate
python bookcipher/runtime.py --lm gpt2 -b 4 --source_file data/eval/ciphertext.txt --gold_file data/eval/plaintext.txt --self-learn --confidence_model left
python bookcipher/runtime.py --help # for more options
# generate and train on the first 800 words of the synthetic corpus
python bookcipher/encipher.py 800
# edit wordbank.py MIN_WORD and MAX_WORD to 1, (size of dictionary), respectively
# yes, this is terrible, sorry
python bookcipher/runtime.py -b 4 --gold_file data/synthetic/test-plaintext.txt --source_file data/synthetic/test-ciphertext-800.txt --lm gpt2 --confidence_model left --synthetic --wordbank data/synthetic/wordbank-800 | tee synthetic-800.out