To install the packages of NgramModel, go to the Playground (Ctrl+OW) in your Pharo image and execute the following Metacello script (select it and press Do-it button or Ctrl+D):
Metacello new
baseline: 'NgramModel';
repository: 'github://olekscode/NgramModel/src';
load.
file := 'pharo-local/iceberg/olekscode/NgramModel/Corpora/brown.txt' asFileReference.
brown := file contents.
model := NgramModel order: 2.
model trainOn: brown.
At each step the model selects top 5 words that are most likely to follow the previous words and returns the random word from those five (this randomnes ensures that the generator does not get stuck in a cycle).
generator := NgramTextGenerator new model: model.
generator generateTextOfSize: 100.
educator cannot describe and edited a highway at private time ``
Fallen Figure Technique tells him life pattern more flesh tremble
with neither my God `` Hit ) landowners began this narrative and
planted , post-war years Josephus Daniels was Virginia years
Congress with confluent , jurisdiction involved some used which
he''s something the Lyle Elliott Carter officiated and edited and
portents like Paradise Road in boatloads . Shipments of Student
Movement itself officially shifted religions of fluttering soutane .
Coolest shade which reasonably . Coolest shade less shaky . Doubts
thus preventing them proper bevels easily take comfort was
The Fulton County purchasing departments do to escape Nicolas Manas .
But plain old bean soup , broth , hash , and cultivated in himself ,
back straight , black sheepskin hat from Texas A & I College and
operates the institution , the antipathy to outward ceremonies hailed
by modern plastic materials -- a judgment based on displacement of his
arrival spread through several stitches along edge to her paper for
further meditation . `` Hit the bum '' ! ! Fort up ! ! Fort up ! !
Kizzie turned to similar approaches . When Mrs. Coolidge for
This model was trained on the corpus composed from the source code of 85,000 Pharo methods tokenized at the subtoken level (composite names like OrderedCollection
were split into subtokens: ordered
, collection
)
super initialize value holders . ( aggregated series := ( margins if nil
if false ) text styler blue style table detect : [ uniform drop list input .
export csv label : suggested file name < a parametric function . | phase
<num> := bit thing basic size >= desired length ) ascii . space width +
bounds top - an event character : d bytes : stream if absent put : answers )
| width of text . status value := dual value at last : category string :=
value cos ) abs raised to n number of
Training the model on the entire Pharo corpus and generating 100 words can take over 10 minutes. So start with a smaller exercise: train a 2-gram model on a Brown corpus (it is the smallest one) and generate 10 words.