Grab
Closed this issue · 1 comments
12/09/2019
Given a dictionary of words, every letter has a distribution by a probability. E.g.
ape
apple
bear
In this example dictionary, letter a
has a 2/3 probability to be followed by letter p
and 1/3 probability to be followed by letter r
.
We also need to use the probability to start and end, e.g. There is a 2/3 probability to start with letter a
and 1/3 probability to start with letter b
.
There is a 2/3 probability to end after letter e
and 100% probability to end after letter r
.
Implement a method generateRandomWord()
using the dictionary distribution above.
Note: no word length is restricted.
Intuition:
- Generate a dict of next letter and its occurrence probability for each letter including "" as the start letter placeholder. In this example, it should be:
{
"": {"a": 0.667, "b": 0.333}, // start placeholder
"a": {"p": 0.667, "r": 0.333},
"b": {"e": 1},
"e": {"": 0.667, "a": 0.333},
"l": {"e": 1},
"p": {"e": 0.333, "l": 0.333, "p": 0.333},
"r": {"": 1} // ending placeholder
}
-
Use that probability dictionary to calculate the next letter, we need to implement a method:
randomNextLetter()
. The trickiest part is how to leverage the distribution probability dictionary to generate the next letter. We can use Reservoir Sampling to decide whether a randomed floatf
generated byrandom.random()
, in the semi-open range [0.0, 1.0). Basic idea is to accumulatively sum up the probability to the previous probability sum. E.g. from"p": {"e": 0.333, "l": 0.333, "p": 0.333}
, we can get"p": {"e": 0.333, "l": 0.667, "p": 1}
, and then test whether the randomed floatf < current_sum_of_probability
. -
Overall logic will be like:
def generateRandomWord(wordList):
distDict = generateDistDict(wordList)
curLetter = ''
letters = [curLetter]
while True:
curLetter = randomNextLetter(curLetter, distDict)
if curLetter == '':
return ''.join('', letters)
letters += curLetter,