Frequncy analysis tool for interactive decoding of a monosubstitution cipertext
You need the colorama package to allow the pretty printing using ANSI escape codes. Or enable ansi.sys
pip install colorama
Unix based terminals should work right out of the box.
Then clone this repo and run!
$ git clone git@github.com:YetAnotherMinion/drunken-hipster.git
$ cd drunken-hipster
Run from the command line with Python 2 to launch an interactive session giving you a prompt.
[shivaebola@localhost drunken-hipster]$ python freq_analysis.py
$ --help
usage: Mono Subsitution Helper [-h] [-decode <char> <char>] [-q]
[--load-file <filename>]
[--load-string <string>] [--sub] [--to-upper]
[--show-mapping] [-c] [-f [1,2,3]]
assists in decoding a mono substitution cipher
optional arguments:
-h, --help show this help message and exit
-decode <char> <char>
<A> <B> where A in ciphertext maps to B in plaintext
-q, --quit Exit the program, discarding all data
--load-file <filename>
Loads a file and treats it as cipher text
--load-string <string>
Loads a string of cipher text from command line
--sub Substitute the characters in the ciphetext with their
assigned mappings
--to-upper Changes lowercase ASCII characters in input format to
upper case
--show-mapping show the individual character level mapping of the
cipher
-c, --ciphertext print the cipher text to the screen
-f [1,2,3], --frequency [1,2,3]
compute a frequency analysis on ciphertext, default
behaviour is to print counts for the 1-grams 2-grams
and 3-grams in the ciphertext sorted from most
frequent to least frequent shown alongside sorted
frequency of n-grams for english language Use optional
argument of [1,2,3] to only show a specific n-gram
frequency analysis, any other values will be ignored
$
================ When you launch the program there is an empty mapping between characters in the ciphertext and the plaintext alphabets.
$ --show-mapping
Mappings:
C: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
P: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
You add mapping by using -decode
which takes a character string of ciphertext which is an injective map to the
plaintext string. These two strings must be the same length.
$ -decode JDS THE
$ --show-mapping
Mappings:
C: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
P: _ _ _ H _ _ _ _ _ T _ _ _ _ _ _ _ _ E _ _ _ _ _ _ _
$
If the mapping strings are not the same length, nothing will happen, although no error message is currently emmited
$ -decode FAT BB
$ --show-mapping
Mappings:
C: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
P: _ _ _ H _ _ _ _ _ T _ _ _ _ _ _ _ _ E _ _ _ _ _ _ _
$
You can use mix lower and upper case letters with no effect, to mappings, some might consider this a bug.
$ --show-mapping
Mappings:
C: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
P: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ E _ _ _ _ _ _ _
$ -decode jd th
$ --show-mapping
Mappings:
C: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
P: _ _ _ H _ _ _ _ _ T _ _ _ _ _ _ _ _ E _ _ _ _ _ _ _
$
You can remove mapping using an underscore character in the plaintext mapping string
$ --show-mapping
Mappings:
C: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
P: _ _ _ H _ _ _ _ _ T _ _ _ _ _ _ _ _ E _ _ _ _ _ _ _
$ -decode JD __
$ --show-mapping
Mappings:
C: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
P: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ E _ _ _ _ _ _ _
You can load textfiles with ascii text. It is unlikely that unicode will work, same goes for binary files.
This program only treats capital ASCII as ciphertext, all other characters are ignored.
Filenames and paths with spaces are not supported.
All filenames are relative to the excution location of the script, so relative paths to
files in other directories are supported as long as there are no spaces If the you want to convert the lower
case letters from the input file to upper case, specify the --to-upper
flag
Currently there is no way to write decoded files, if you need that feature let me know using the Issue button
=================================
We load a source of cipher text either a string or a file. Then we can analysze the frequency in which N-grams appear.
We display the frequency of all n-grams using -f
or --frequency
, an option argument N [1,2,3] will only display frequency for that N-gram. The display N-grams is truncated to the top 50 for display. Below you can see the output has been truncated further by me for brevity. The format is | Mth most common cipther N-GRAM | APPERANCE COUNT | Mth most common English N-gram |
$ --load-file krypton3.txt
$ --frequency
Monographs
S 456 E
Q 340 T
J 301 A
U 257 O
B 246 I
N 240 N
C 227 S
G 227 R
D 210 H
Z 132 L
...
Digraphs
JD 96 TH
DS 83 HE
SN 68 IN
SU 63 ER
QN 54 AN
NS 54 RE
CG 53 ON
SW 52 AT
...
Trigraphs
JDS 61 THE
QGW 27 AND
SQN 23 ING
DSN 22 ION
SNS 19 TIO
DCU 19 ENT
JSN 16 ATI
CGE 16 FOR
...
Using the --sub
option, everywhere we usually see ciphertext we can replace with plaintext characters using our defined mapping. In theory this can help decide which characters are left to map, and what are reasonable choices for the next guess. Remember that you can always unmap, so guess away!
It is only possible to use --load-string with a single word of text, whitespace gets eaten and trying to escape by quoting or backticks are ignored by argparse
$ --load-string QVJDB MEDGB QJJSG WQGZS NSZBN
Mono Subsitution Helper: error: unrecognized arguments: MEDGB QJJSG WQGZS NSZBN
Instead use underscores, which are ignored by the actual parsing and decoding
$ --load-string QVJDB_MEDGB_QJJSG_WQGZS_NSZBN
This is not great behavior, so please fix this and make a pull request