Han Oostdijk, 12May2021
HOQCenc is an R package containing routines for encryption and
decryption of character strings and files. The main function is xcode
that encrypts an input string by executing a number of
operations (specified by argument trans
) when argument
ed='e'
. When ed='d'
is specified decryption will be done by
executing the operations in reversed order.
You can install this version from GitHub with:
# install.packages("devtools")
devtools::install_github("HanOostdijk/HOQCenc",build_vignettes=TRUE)
In the Details section more examples are given and some background. In this first example we show:
- how to encrypt a message
- that the result of encryption always is a character string with letters from the alphabet
- that by choosing a set of Operations it is possible that small changes in the message lead to big differences in the encrypted text
- that (happily) a sequence of encrypting and decryption returns the original string
key <- "1VerySecretKey"
message1 <- "Line 1 of a two-line message!\nAs expected a second line."
message2 <- "Line 1 of a two-line message.\nAs expected a second line."
(s <- xcode(message1,key=key,ed='e',trans='cscvp')) # encrypt message1
#> [1] "tSCLY5l2X4eM81f8Q7D3 gbmK9#hRSCtD5VC4M9CpH2LxQqGKKl4oHoKqc3dqOOT"
xcode(s,key=key,ed='d',trans='cscvp') # show result of decryption
#> [1] "Line 1 of a two-line message!\nAs expected a second line."
xcode(message2,key=key,ed='e',trans='cscvp') # encrypt message2
#> [1] "iixEWSv881hYenne#d9VnDEMEowLfD1HrwHyAiHK4qzwbwtneecPdwg1k1OioMOT"
When we specify key=''
an ordered set is constructed consisting of the
lower and upper case letters, the digits 0 till 9 and the space and the
#
character. This ordered set is our ‘alphabet’: all operations will
work only with this set with the exception of the e
operation that
converts our standard symbols to this new alphabet.
When we use the key 1VerySecretKey
the characters in this key will be
inserted in this set before the others so that the following ordered set
will be constructed:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "1" "V" "e" "r" "y" "S" "c" "t"
[2,] "K" "a" "b" "d" "f" "g" "h" "i"
[3,] "j" "k" "l" "m" "n" "o" "p" "q"
[4,] "s" "u" "v" "w" "x" "z" "A" "B"
[5,] "C" "D" "E" "F" "G" "H" "I" "J"
[6,] "L" "M" "N" "O" "P" "Q" "R" "T"
[7,] "U" "W" "X" "Y" "Z" "0" "2" "3"
[8,] "4" "5" "6" "7" "8" "9" " " "#"
In almost all operations (but not the e
, s
and f
operation) this
‘alphabet’ will be used.
The following operations are defined (we describe only what happens in case of encryption because for decryption the reversed action is performed):
- e : encode all characters that are not a blank, a letter or a digit and ensure that the length of the resulting string is a multiple of 16
- s : shuffle the characters in such a way that
0123456789abcdef
is translated to0f2d4b6987a5c3e1
- f : flip the characters in such a way that
0123456789abcdef
is translated tofedcba9876543210
- v : a Vigenère translation based on the key: the x-th character is shifted in the ordered set based on the position of the x-th character in the ordered set.
- c : a Vigenère translation based on the first character of the ordered set for the first character and from then on based on the result for the preceding character.
- p : a Playfair translation based on an 8x8 square filled with the ordered set (as show above)
- a : an AES translation based on the key. This is in fact the
[digest::AES()] function with
mode='ECB'
- h : a Hill linear transformation based on the key.
The encryption characteristics of these operations are the following:
- e : no real encryption because with a hex table the source text is easily derived
- s : no encryption but useful in combination
- f : no encryption but useful in combination
- v : encryption with strength ‘proportional’ to the number of unique characters in the key
- c : encryption based on key of length one, but has the advantage
that a difference in a character leads to difference in later
results. Always combine with
v
,p
ora
operations - p : pairwise encryption with strength better when the number of unique characters in the key is greater. When a pair occurs more than once, it is encrypted the same each time
- a : When a 16-character block occurs more than once, it is encrypted the same each time.
- h : This handles 4 consecutive characters
So it is advised to combine several options as e.g. done in the example below
key <- "1VerySecretKey"
# small differences in text lead to similar outcomes with simple trans
xcode("abcdefghijklmnopqrstuwvxyz",key=key,ed='e',trans='p')
#> [1] "bdrhybhiKqlmnopqmtB1vxwzSx9339 "
xcode("1bcdefghijklmnopqrstuwvxyz",key=key,ed='e',trans='p')
#> [1] "eKrhybhiKqlmnopqmtB1vxwzSx9339 "
xcode("abcdefghijklmnopqrstuwvxy1",key=key,ed='e',trans='p')
#> [1] "bdrhybhiKqlmnopqmtB1vxwzSV9339 "
xcode("ABabcdefghijklmnopqrstuwvxy1",key=key,ed='e',trans='p')
#> [1] "BsbdrhybhiKqlmnopqmtB1vxwzSV9329"
# small differences in text lead to unrelated outcomes with complex trans
xcode("abcdefghijklmnopqrstuwvxyz",key=key,ed='e',trans='cscvp')
#> [1] "DiQFluQYGgAia1Rck6#8ztKqOFCQupSH"
xcode("1bcdefghijklmnopqrstuwvxyz",key=key,ed='e',trans='cscvp')
#> [1] "B2k1QQQRlZ6GwdAT#kqrJ6K1VXZNwVH "
xcode("abcdefghijklmnopqrstuwvxy1",key=key,ed='e',trans='cscvp')
#> [1] "GTdRmtkyUuNwof6lzcieMmxz3TojYC6v"
xcode("ABabcdefghijklmnopqrstuwvxy1",key=key,ed='e',trans='cscvp')
#> [1] "7nCbDxf2eZMDNr Ywl9bqqJ4Mi2GkXdk"
Most of the operations used in the xcode
function are working only at
the 64 characters of our ordered set of the lower and upper case
letters, the digits 0 till 9 and the space and the #
character. The
e
operation:
- encodes the
#
character and the characters outside the ordered set by replacing each of those by#
and their hex representation. E.g. the exclamation mark is replaced by#21
and the#
character itself is replaced by#23
. - ensures that the number of characters is a multiple of 16 by adding
one or more fillers (consisting of
#00
) and possible one or two spaces. The multiple of 16 is necessary when thea
operation (AES encryption) is used but for the other operations an even number of characters is sufficient.
Because the e
operation is necessary for most cases it is always
prefixed to the list of operations that is specified in the trans
argument.
xcode('Deze #!',key='',ed='e',trans='')
#> [1] "Deze #23#21#00 "
xcode('example!',key='',ed='e',trans='')
#> [1] "example#21#00#00"
xcode('example!',key='',ed='e',trans='p')
#> [1] "hueiimh832833899"
xcode('example!',key='',ed='e',trans='a')
#> [1] "f8028741438cf02b46c7c993804538ef"
Only when the noe
argument is explicitly set to TRUE
this is not
done.
xcode('example!',key='',ed='e',trans='',noe=T)
#> [1] "example!"
xcode('example!',key='',ed='e',trans='p',noe=T)
#> Error in playfair(text, ed): invalid characters in text of `p` operation: `!`
xcode('example ',key='',ed='e',trans='p',noe=T) # but with space instead of !
#> [1] "hueiimg8"
xcode('example ',key='',ed='e',trans='a',noe=T)
#> Error in aes$encrypt(text): Text length must be a multiple of 16 bytes
With the p
operation a Playfair transformation is done that acts on
pairs of characters. With key=''
the following ‘alphabet’ is used
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "a" "b" "c" "d" "e" "f" "g" "h"
[2,] "i" "j" "k" "l" "m" "n" "o" "p"
[3,] "q" "r" "s" "t" "u" "v" "w" "x"
[4,] "y" "z" "A" "B" "C" "D" "E" "F"
[5,] "G" "H" "I" "J" "K" "L" "M" "N"
[6,] "O" "P" "Q" "R" "S" "T" "U" "V"
[7,] "W" "X" "Y" "Z" "0" "1" "2" "3"
[8,] "4" "5" "6" "7" "8" "9" " " "#"
and therefore “ex” is converted to "hu’ because ‘h’ and ‘u’ lie on the opposite corner of the rectangle though ‘e’ and ‘x’. Special cases:
- equal row : ‘km’ -> ‘ln’ and ‘ip’ -> ’ji
- equal column : ‘zP’ -> ‘HX’ and ‘J7’ -> ‘Rd’
- equal cell : ‘CC’ -> ‘LL’ , ‘FF’ -> ‘GG’ and ‘##’ -> ‘aa’
Advantage of Playfair is that a frequency distribution on character level makes little sense because the method works on pairs of characters.
xcode('example ',key='',ed='e',trans='p',noe=T)
#> [1] "hueiimg8"
xcode("hueiimg8",key='',ed='d',trans='p',noe=T)
#> [1] "example "
The s
(shuffle) operation shuffles the characters in such a way that
0123456789abcdef
is translated to 0f2d4b6987a5c3e1
and the f
(flip) operation flips the characters in such a way that
0123456789abcdef
is translated to fedcba9876543210
.
xcode('0123456789abcdef',key='',ed='e',trans='s',noe=T)
#> [1] "0f2d4b6987a5c3e1"
xcode('0123456789abcdef',key='',ed='e',trans='f',noe=T)
#> [1] "fedcba9876543210"
The v
(Vigenère) operation shifts the i-th character a number of
places in the ‘alphabet’. That number is determined by the position of
the i-th letter in the ‘alphabet’. In the first example the ‘alphabet’
is ABCabcdef ...
and therefore the shifts for the letters ‘a’, ‘b’
etc. are 4, 5 etc. resulting in ‘4’, ‘6’ etc. In the second example the
‘alphabet’ is abcdef ...
and therefore the shifts for the letters ‘a’,
‘b’ etc. are 1, 2 etc. resulting in ‘1’, ‘3’ etc.
xcode('0123456789abcdef',key='ABC',ed='e',trans='v',noe=T)
#> [1] "468 ACbdfhoqsuwy"
xcode('0123456789abcdef',key='abc',ed='e',trans='v',noe=T)
#> [1] "13579#bdfhlnprtv"
The c
operation shifts the first character a number of places in the
‘alphabet’. That number is again determined by the position of the first
letter in the ‘alphabet’. In the first example the ‘alphabet’ is
abcdef ...
and therefore the shifts for the letter ‘a’ is 1 resulting
in ‘1’ . The shift for the next position is therefore determined by the
position of ‘1’ in the ‘alphabet’ and that is 44. So the next position
(the ‘1’) is shifted by 44 positions in the ‘alphabet’ resulting in ‘R’
. And so on … The main advantage of this method is that the encryption
of a character is determined by all of its predecessors. The weak point
is that only the position of the ‘a’ in the ‘alphabet’ has to be guessed
or tried by brute force and this is very easy. So this method should
never be used on its own.
xcode('0123456789abcdef',key='abc',ed='e',trans='c',noe=T)
#> [1] "1RIAtnieb#acfjou"
The a
uses the digest::AES
function with the mode='ECB'
method.
The method works in blocks of 16 characters: blocks that are identical
with the exception of one character will be encrypted to very unrelated
results. However if a block occurs more than once, each block is
identically encrypted. So also this method should never be used on its
own.
xcode('0123456789abcdef',key='abc',ed='e',trans='a',noe=T)
#> [1] "747f22502381a3fb7eb0cb42cb5f6612"
xcode('1123456789abcdef',key='abc',ed='e',trans='a',noe=T)
#> [1] "85762ea7ba16b3a881fc3144898bcdec"
xcode('0123456789abcdee',key='abc',ed='e',trans='a',noe=T)
#> [1] "4ada7b6fcbeb0f93d9fdb0d731074152"
xcode('0123456789abcdef0123456789abcdef',key='abc',ed='e',trans='a',noe=T)
#> [1] "747f22502381a3fb7eb0cb42cb5f6612747f22502381a3fb7eb0cb42cb5f6612"
The h
uses the Hill
transformation. The method works in blocks of 4
characters: blocks that are identical with the exception of one
character will be encrypted to very unrelated results. Each character is
transformed to its rank in the ordered set. A linear transformation is
then done on these four numbers by applying a matrix multiplication and
the resulting numbers (modulo 64) are used as rank numbers in the
ordered set to determine the four output characters. The matrix used is
a fixed matrix. The examples show (again) that it is necessary to
combine this operation with other ones to make it more safe.
xcode('0123456789abcdef',key='abc',ed='e',trans='h',noe=T)
#> [1] "3y 8tOwy3 YazqsE"
xcode('1123456789abcdef',key='abc',ed='e',trans='h',noe=T)
#> [1] "4yb8tOwy3 YazqsE"
xcode('0123456789abcdee',key='abc',ed='e',trans='h',noe=T)
#> [1] "3y 8tOwy3 YayprB"
xcode('0123456789abcdef0123456789abcdef',key='abc',ed='e',trans='h',noe=T)
#> [1] "3y 8tOwy3 YazqsE3y 8tOwy3 YazqsE"
The xcode
function can also handle raw vectors
text <- "this will be raw text"
(textraw <- charToRaw(text))
#> [1] 74 68 69 73 20 77 69 6c 6c 20 62 65 20 72 61 77 20 74 65 78 74
(s<-xcode(textraw,key='abc',ed='e',trans='cscp'))
#> [1] "wZMbUPjpN2jYeY X6cUZEVuQpUm3ijNKUV7l4FoHp BuNBSYYnbJz5t2pTfS5SYHJUTWDSrP1kqWBaJ#"
(s<-xcode(s,key='abc',ed='d',trans='cscp'))
#> [1] 74 68 69 73 20 77 69 6c 6c 20 62 65 20 72 61 77 20 74 65 78 74
rawToChar(s)
#> [1] "this will be raw text"
To ease reading and writing of files with and without encryption four functions are defined:
write_enc_ascfile
andread_enc_ascfile
when handling asci (standard text) datawrite_enc_binfile
andread_enc_binfile
when handling binary data
To demonstrate the various possibilities we first define text to work on and generate the name of a test file:
text <- "This is a 21st century example of an English text."
textraw <- charToRaw(text)
tfile <- tempfile(pattern = "file", tmpdir = tempdir(), fileext = ".txt")
Example of working with asci text (remember that the encrypted text is always asci):
# write asci with encoding
write_enc_ascfile(text,tfile,encode=T)
# read without decoding
(read_enc_ascfile(tfile,decode=F))
#> [1] "iFLbqk9MTET mpNazXQsSBBJLgPEU3LF1J5iuesj9#dMJsf7aTtkp32Slzk6Tfno"
# read with decoding
(read_enc_ascfile(tfile,decode=T))
#> [1] "This is a 21st century example of an English text."
Again asci text but now using the binary mode functions:
# write asci in binary mode with encoding
write_enc_binfile(text,tfile,encode=T)
# read without decoding and no raw -> character conversion
(read_enc_binfile(tfile,decode=F,raw2char=F))
#> [1] 69 46 4c 62 71 6b 39 4d 54 45 54 20 6d 70 4e 61 7a 58 51 73 53 42 42 4a 4c
#> [26] 67 50 45 55 33 4c 46 31 4a 35 69 75 65 73 6a 39 23 64 4d 4a 73 66 37 61 54
#> [51] 74 6b 70 33 32 53 6c 7a 6b 36 54 66 6e 6f 00
# read without decoding and with raw -> character conversion
(read_enc_binfile(tfile,decode=F,raw2char=T))
#> [1] "iFLbqk9MTET mpNazXQsSBBJLgPEU3LF1J5iuesj9#dMJsf7aTtkp32Slzk6Tfno"
# read with decoding and no raw -> character conversion
(read_enc_binfile(tfile,decode=T))
#> [1] "This is a 21st century example of an English text."
Now with binary data and therefore the write_enc_binfile
and
read_enc_binfile
functions must be used:
# write raw text binary mode with encoding
write_enc_binfile(textraw,tfile,encode=T)
# read without decoding and no raw -> character conversion
(read_enc_binfile(tfile,decode=F,raw2char=F))
#> [1] 4c 77 43 78 37 23 56 72 61 68 67 44 32 72 6c 55 50 42 79 46 4d 75 59 35 43
#> [26] 53 56 6d 42 4f 53 41 5a 6d 57 73 44 53 49 56 4f 4c 58 76 75 39 4f 20 57 36
#> [51] 20 53 61 4a 43 59 66 54 79 54 67 63 71 36 49 61 23 33 78 46 4a 68 6c 73 6a
#> [76] 75 4a 78 6d 76 75 64 6c 67 33 38 39 4a 4d 45 79 4e 6d 74 39 30 51 44 48 54
#> [101] 55 72 69 44 32 70 6d 51 44 4d 50 77 44 78 67 6d 7a 4d 4f 37 6d 44 20 69 33
#> [126] 69 56 5a 58 32 4c 5a 57 56 63 42 5a 6c 58 42 35 54 31 74 56 4e 57 38 6e 78
#> [151] 49 63 31 55 74 70 56 4f 56 69 00
# read without decoding and with raw -> character conversion
(read_enc_binfile(tfile,decode=F,raw2char=T))
#> [1] "LwCx7#VrahgD2rlUPByFMuY5CSVmBOSAZmWsDSIVOLXvu9O W6 SaJCYfTyTgcq6Ia#3xFJhlsjuJxmvudlg389JMEyNmt90QDHTUriD2pmQDMPwDxgmzMO7mD i3iVZX2LZWVcBZlXB5T1tVNW8nxIc1UtpVOVi"
# read with decoding and no raw -> character conversion
(read_enc_binfile(tfile,decode=T,raw2char=F))
#> [1] 54 68 69 73 20 69 73 20 61 20 32 31 73 74 20 63 65 6e 74 75 72 79 20 65 78
#> [26] 61 6d 70 6c 65 20 6f 66 20 61 6e 20 45 6e 67 6c 69 73 68 20 74 65 78 74 2e
# read with decoding and with raw -> character conversion
(read_enc_binfile(tfile,decode=T,raw2char=T))
#> [1] "This is a 21st century example of an English text."