/sensidice

A module to aid in creating sensible passphrases using dice.

Primary LanguagePython

sensidice: a module to aid in creating sensible passphrases using dice.

Copyright 2009 Amir Yalon
See below for license information.

The basic usage of this module is hopefully demonstrated well enough by the
doctests (small examples in the code). What it does, roughly, is create digital
numeric representations from Python sequences (e.g. lists). One trivial example
would be creating a hexadecimal representation:
--8<----
import sensidice
d = sensidice.Digital([0,1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'])
d(31)

-->8----
This will simply yield [1, 'F'], which is similar to 1F, the usual hexadecimal
representation of the number 31.

The module is called "sensidice" because it is supposed to aid in creating a
"sensible passphrase using dice". While one established method
<http://diceware.com/> yields just a series of short words chosen out of a list
only less than 8k words long, using sensidice with a much larger part-of-speech
database and the same dice throws can yield a sensible sentence.

IMPORTANT: This text does not qualify as advice on security. YOUR PASSPHRASES
ARE YOUR RESPONSIBILITY. Granted, any constructive feedback is great; thanks!

To demonstrate, we will use the Part of Speech Database from Kevin's Word List
Page <http://wordlist.sourceforge.net/>. We assume that pos/part-of-speech.txt
is in our working directory.

--8<----
import re
import sensidice

POS_FILE='pos/part-of-speech.txt'

pattern = re.compile(r'^(?P<word>.*)\t(?P<flags>.*)$')
posdata = [
    { 'pos': 'article',   'flags': set('DI'),   'items': [] },
    { 'pos': 'adverb',    'flags': set('v'),    'items': [] },
    { 'pos': 'adjective', 'flags': set('A'),    'items': [] },
    { 'pos': 'verb',      'flags': set('Vti'),  'items': [] },
    { 'pos': 'noun',      'flags': set('NphP'), 'items': [] },
    { 'pos': 'other',     'flags': set(),       'items': [] },
]

with open(POS_FILE) as f:
    for w in [pattern.match(line).groupdict() for line in f]:
        (word, flags) = (w['word'], w['flags'])
        for pos in posdata:
            if pos['flags'] and pos['flags'].intersection(flags):
                pos['items'].append(word)
                break

digitals = {}
for pos in posdata:
    key = pos['pos']
    digitals[key] = sensidice.Digital(pos['items'])

"\n".join(['%(n)s %(digital)ss' % {'n': digitals[key].base(), 'digital': key}
           for key in digitals.keys()])

-->8----
Running this in an interactive Python session, we get a summary of how many
words we collected for each part of speech. If we continue with the same
session, we can start playing with the items in the digitals dictionary:
--8<----
verbal_phrase = digitals['adverb'] * digitals['verb'] + \
                digitals['verb'] * digitals['adverb']

import math
math.log(verbal_phrase.base(), 2)
math.log(verbal_phrase.base(), 6**5)

import random
verbal_phrase(random.randrange(verbal_phrase.base()))

-->8----
As shown above, using only verbal phrases, almost 30 bits of entropy
(equivalent to more than 2 throws of 5 dice) can be squeezed into just two
words. Actually, some of the entries in the Part of Speech database consist of
more than one word, but you get the general idea.

Adding nouns:
--8<----
noun_phrase = digitals['adjective'] * digitals['noun']
phrase = verbal_phrase * noun_phrase + noun_phrase * verbal_phrase

math.log(phrase.base(), 2)
math.log(phrase.base(), 6**5)

phrase(random.randrange(phrase.base()))

-->8----
We show that four words are enough for the entropy of almost 5 throws of 5
dice, and we are still far from exhausting the mudule's possibilities.

That is all for now; I hope you find it useful. You are invited to follow the
project on GitHub <http://github.com/amiryal/sensidice>.

Yours,
    Amir Yalon <sensidice@amir.eml.cc>

-- 
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.