/Wordlegames.jl

Play Wordle and related games

Primary LanguageJuliaMIT LicenseMIT

Wordlegames - play and analyze Wordle and related games

Stable Dev Build Status Coverage Code Style: Blue PkgEval

This Julia package allows for playing and analyzing Wordle and related games, such as Primel.

A game is represented by a GamePool object containing potential guesses, a subset of which are valid targets, and some game play status information. By default the game is played as in the "Hard Mode" setting on the Wordle app and web site, which means that the only guesses allowed at each turn are those in the current target pool. As a consequence, the initial pool of potential guesses is the same as the initial target pool.

julia> using Chain, DataFrames, Primes, Random, StatsBase, UnicodePlots, Wordlegames

julia> datadir = joinpath(dirname(dirname(pathof(Wordlegames))), "data");

julia> wordle = GamePool(collect(readlines(joinpath(datadir, "Wordletargets.txt"))));

This creates a GamePool from the Wordle targets, a list of 2315 5-letter English words. The playgame! and showgame! methods can play a Wordle game, selecting each guess according to a criterion. By default the guess is chosen to maximize the entropy of the distribution of scores from the current target pool, as explained below.

For example, suppose the target is "super". It takes 4 guesses to isolate this target using this strategy.

julia> showgame!(wordle, "super")
4×7 DataFrame
 Row │ poolsz  index  guess   expected  entropy  score       sc    
     │ Int64   Int64  String  Float64   Float64  String      Int64 
─────┼─────────────────────────────────────────────────────────────
   12315   1535  raise   61.0009   5.87791  🟨🟫🟫🟨🟨     85
   218   1720  sheer    2.11111  3.28104  🟩🟫🟫🟩🟩    170
   34   1835  sober    1.5      1.5      🟩🟫🟫🟩🟩    170
   42   1969  super    1.0      1.0      🟩🟩🟩🟩🟩    242

The size of the initial target pool is 2315. The first guess, "raise", will reduce the size of target pool after it has been scored. It is not known what the score will be but the set of scores from all possible targets can be calculated. Assuming the possible targets are equally likely, this gives a distribution of scores, and also a distribution of pool sizes after the guess is scored. Informally, the entropy of the distribution of scores is a measure of how uniformly they are distributed over the set of the possible scores. Choosing the guess with the greatest entropy will likely result in a large reduction in the size of the target pool after the guess is scored.

The expected size of the target pool, after this guess is scored, is a little over 61. The actual score in this game, represented as 🟨🟫🟫🟨🟨 in colored tiles or [1,0,0,1,1] as digits, indicates that r, s and e are in the target but not in the guessed positions and a and i do not occur in the target. (The sc value in that row, 85, is the decimal value of 10011 in base-3.)

(This package uses the Unicode character U+F7EB, the :large_brown_square: emoji, 🟫, instead of a gray square for the "didn't match" tile - a kind of "traffic lights" motif. But the real reason for this choice is that it is surprisingly difficult to get a consistent-width black or gray square symbol in many fonts.)

There are only 18 of the 2315 possible targets that would have given this score. Of these 18 targets the guess that will do the best job of spreading out the distribution of scores is "sheer". The actual score for this guess is 🟩🟫🟫🟩🟩, meaning that the s, the second e and the r are in the correct positions, the h is not in the target and there isn't a second e.

(When a character is repeated in a guess but occurs only once in the target, "correct position" takes precedence over "in the target", as in this case. If none of the guesses are in the correct position then the leftmost position in the guess takes precedence.)

The size of the target pool is reduced to 4, which is larger than the expected size of 2.11, and the game continues with another guess ("sober") and another score (🟩🟫🟫🟩🟩) until the target, "super" is matched.

If no target is specified in a call to showgame! or playgame! one is chosen at random from the set of possible targets.

julia> Random.seed!(1234321);  # initialize the random number generator

julia> showgame!(wordle)
4×7 DataFrame
 Row │ poolsz  index  guess   expected  entropy   score       sc    
     │ Int64   Int64  String  Float64   Float64   String      Int64 
─────┼──────────────────────────────────────────────────────────────
   12315   1535  raise   61.0009    5.87791  🟫🟫🟫🟫🟫      0
   2168   1275  mulch    6.85714   5.21165  🟫🟫🟫🟫🟨      1
   36   2262  whoop    1.0       2.58496  🟫🟨🟫🟫🟫     27
   41    985  hobby    1.0      -0.0      🟩🟩🟩🟩🟩    242

The target can also be specified as an integer between 1 and length(wordle.targetpool).

julia> showgame!(wordle, 1234)
3×7 DataFrame
 Row │ poolsz  index  guess   expected  entropy  score       sc    
     │ Int64   Int64  String  Float64   Float64  String      Int64 
─────┼─────────────────────────────────────────────────────────────
   12315   1535  raise    61.0009  5.87791  🟫🟫🟨🟫🟩     11
   225    198  binge     3.64    3.28386  🟫🟩🟩🟫🟩     74
   32   1234  mince     1.0     1.0      🟩🟩🟩🟩🟩    242

This mechanism allows for playing all of the 2315 possible games and accumulating some statistics.

julia> nguesswordle = [length(playgame!(wordle, k).guesses) for k in axes(wordle.targetpool, 1)];

julia> barplot(countmap(nguesswordle))
     ┌                                        ┐ 
   11                                        
   2 ┤■■■■■ 131                                 
   3 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 999   
   4 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 919      
   5 ┤■■■■■■■ 207                               
   6 ┤■■ 47                                     
   79                                        
   82                                        
     └                                        ┘ 

Playing all possible Wordle games in this way takes less than half a second on a not-very-powerful laptop.

julia> versioninfo()
Julia Version 1.8.0-beta1
Commit 7b711ce699 (2022-02-23 15:09 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, tigerlake)
  Threads: 4 on 8 virtual cores

The mean and standard deviation of the number of guesses for Wordle using this strategy

julia> (n̄ = mean(nguesswordle), s = std(nguesswordle))
(n̄ = 3.5991360691144707, s = 0.8490164812102081)

are reasonable but not optimal. Grant Sanderson has a YouTube video describing a strategy the gives a mean of 3.43 guesses. Later, in a tweet, he referred to a strategy with a mean of 3.42 guesses.

Also, the barplot shows that there are 11 of the 2315 games that are not solved in 6 guesses by this strategy.

The games that require 8 guesses are

julia> [showgame!(wordle, k) for k in findall(==(8), nguesswordle)]
2-element Vector{DataFrame}:
 8×7 DataFrame
 Row │ poolsz  index  guess   expected  entropy    score       sc    
     │ Int64   Int64  String  Float64   Float64    String      Int64 
─────┼───────────────────────────────────────────────────────────────
   12315   1535  raise   61.0009    5.87791   🟨🟫🟫🟫🟨     82
   2102    546  deter    9.23529   4.37007   🟫🟫🟫🟩🟩      8
   326    454  cower    5.23077   2.74682   🟫🟩🟫🟩🟩     62
   49    999  hover    3.44444   1.65774   🟫🟩🟫🟩🟩     62
   55   1059  joker    2.2       1.37095   🟫🟩🟫🟩🟩     62
   63    258  boxer    1.66667   0.918296  🟫🟩🟫🟩🟩     62
   72    800  foyer    1.0       1.0       🟫🟩🟫🟩🟩     62
   81    884  goner    1.0      -0.0       🟩🟩🟩🟩🟩    242
 8×7 DataFrame
 Row │ poolsz  index  guess   expected  entropy    score       sc    
     │ Int64   Int64  String  Float64   Float64    String      Int64 
─────┼───────────────────────────────────────────────────────────────
   12315   1535  raise   61.0009    5.87791   🟫🟩🟫🟫🟫     54
   291   2012  tangy    7.48352   4.03061   🟨🟩🟫🟫🟫    135
   313    334  caput    2.84615   2.4997    🟨🟩🟫🟫🟨    136
   45    160  batch    3.4       0.721928  🟫🟩🟩🟩🟩     80
   54    959  hatch    2.5       0.811278  🟫🟩🟩🟩🟩     80
   63   1102  latch    1.66667   0.918296  🟫🟩🟩🟩🟩     80
   72   1206  match    1.0       1.0       🟫🟩🟩🟩🟩     80
   81   2233  watch    1.0      -0.0       🟩🟩🟩🟩🟩    242

Related games

Wordle has spawned a huge number of related games.

One such game is Primel where the targets are 5-digit prime numbers. The Primel game from 2022-02-15 was played by entering the scores manually with scoreupdate! after each guess is copied onto the game-play page. The summary property of a GamePool shows the guesses and scores to this point, and the next guess to use.

julia> primel = GamePool(primes(10000, 99999));

julia> primel.summary
1×7 DataFrame
 Row │ poolsz  index  guess   expected  entropy  score    sc      
     │ Int64   Int64  String  Float64   Float64  String?  Int64?  
─────┼────────────────────────────────────────────────────────────
   18363    313  12953    124.384  6.63227  missing  missing 

julia> scoreupdate!(primel, [1,0,0,0,1]).summary
2×7 DataFrame
 Row │ poolsz  index  guess   expected   entropy  score       sc      
     │ Int64   Int64  String  Float64    Float64  String?     Int64?  
─────┼────────────────────────────────────────────────────────────────
   18363    313  12953   124.384    6.63227  🟨🟫🟫🟫🟨       82
   2236   2612  36187     6.30508  5.57465  missing     missing 

julia> scoreupdate!(primel, [2,2,1,0,0]).summary
3×7 DataFrame
 Row │ poolsz  index  guess   expected   entropy  score       sc      
     │ Int64   Int64  String  Float64    Float64  String?     Int64?  
─────┼────────────────────────────────────────────────────────────────
   18363    313  12953   124.384    6.63227  🟨🟫🟫🟫🟨       82
   2236   2612  36187     6.30508  5.57465  🟩🟩🟨🟫🟫      225
   33   2597  36011     1.0      1.58496  missing     missing 

julia> scoreupdate!(primel, [2,2,2,2,2]).summary
3×7 DataFrame
 Row │ poolsz  index  guess   expected   entropy  score       sc    
     │ Int64   Int64  String  Float64    Float64  String      Int64 
─────┼──────────────────────────────────────────────────────────────
   18363    313  12953   124.384    6.63227  🟨🟫🟫🟫🟨     82
   2236   2612  36187     6.30508  5.57465  🟩🟩🟨🟫🟫    225
   33   2597  36011     1.0      1.58496  🟩🟩🟩🟩🟩    242

Playing all possible Primel games produces statistics of

julia> nguessprimel = [length(playgame!(primel, k).guesses) for k in axes(primel.active, 1)];

julia> barplot(countmap(nguessprimel))
     ┌                                        ┐ 
   11                                        
   2 ┤■■ 215                                    
   3 ┤■■■■■■■■■■■■■■■■■■■■■■■■ 3173             
   4 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 4477   
   5 ┤■■■■ 482                                  
   615                                       
     └                                        ┘ 

julia> (n̄ = mean(nguessprimel), s = std(nguessprimel))
(n̄ = 3.6300370680377854, s = 0.6413308603862167)

Because there are more targets initially in Primel than in Wordle, the mean number of guesses is greater. However, the standard deviation of the length of Primel games played this way is smaller than that for Wordle, perhaps because the number of possible characters at each position (10) is smaller than for Wordle (26).

Strategy

Each turn in a Wordle-like game can be regarded as submitting a guess to an "oracle" which returns a score that is used to update the information on the play. Initially the target can be any element of the target pool. Each guess/score combination reduces the size of the target pool, as shown in the game summaries above.

In a GamePool object the actual pool of potential targets and guesses is not modified. Instead there is a BitVector field, active, that is used to keep track of the active target pool. The size of the current target pool is the sum of active.

The score for a particular guess is known to the oracle but not to the player. However, the scores for any potential guess and a member of the target pool can be evaluated. The number of possible scores is finite (3^N where N is the number of tiles in the score).

For example the first guess chosen in the Wordle games shown about is "raise", which is at position 1535 in wordle.targetpool.

julia> reset!(wordle);  # reset the `GamePool` to its initial state

julia> only(wordle.guesses).index  # check there is exactly one guess and return its index
1535

julia> bincounts!(wordle, 1535);   # evaluate the bin counts for that guess

julia> @chain DataFrame(score = tiles.(0:242, 5), counts = wordle.counts) begin
           subset(:counts => x -> x .> 0)
           sort(:counts; rev=true)
       end
132×2 DataFrame
 Row │ score       counts 
     │ String      Int64  
─────┼────────────────────
   1 │ 🟫🟫🟫🟫🟫     168
   2 │ 🟫🟫🟫🟫🟨     121
   3 │ 🟫🟫🟨🟫🟫     107
   4 │ 🟨🟫🟫🟫🟫     103
   5 │ 🟨🟫🟫🟫🟨     102
   6 │ 🟫🟨🟫🟫🟫      92
   7 │ 🟫🟩🟫🟫🟫      91
   8 │ 🟫🟫🟫🟨🟫      80
   9 │ 🟨🟨🟫🟫🟫      78
  10 │ 🟫🟨🟫🟫🟨      69
  11 │ 🟫🟫🟫🟫🟩      61
  12 │ 🟫🟫🟩🟫🟫      51
  13 │ 🟫🟨🟫🟨🟫      43
  14 │ 🟫🟫🟫🟨🟨      41
  15 │ 🟫🟨🟫🟫🟩      41
           
 118 │ 🟨🟨🟩🟫🟩       1
 119 │ 🟨🟨🟩🟩🟩       1
 120 │ 🟨🟩🟫🟩🟩       1
 121 │ 🟩🟫🟫🟨🟫       1
 122 │ 🟩🟫🟫🟩🟫       1
 123 │ 🟩🟫🟨🟨🟫       1
 124 │ 🟩🟫🟨🟩🟩       1
 125 │ 🟩🟫🟩🟫🟫       1
 126 │ 🟩🟫🟩🟫🟨       1
 127 │ 🟩🟨🟫🟩🟫       1
 128 │ 🟩🟨🟨🟫🟫       1
 129 │ 🟩🟩🟫🟫🟩       1
 130 │ 🟩🟩🟫🟨🟫       1
 131 │ 🟩🟩🟩🟫🟫       1
 132 │ 🟩🟩🟩🟩🟩       1
          102 rows omitted

julia> (expectedpoolsize(wordle), entropy2(wordle))
(61.00086393088553, 5.877909690821478)

Assuming the targets are equally likely, which apparently is the case in the online games, the probability of each score is the count for that score divided by the size of the active target pool. The expected pool size is the sum of the counts multiplied by the probabilities or, equivalently, the sum of the squared counts divided by the sum of the counts.

julia> sum(abs2, wordle.counts) / sum(wordle.counts)  # abs2(x) returns x * x
61.00086393088553

Measured in bits, the entropy of the probabilities is - Σᵢ pᵢ log₂(pᵢ). Entropy measures how the probability is dispersed among the possible scores. The best case is for each of the n possible scores to have probability 1/n of occurring. In that case, whichever score is returned, there will only be a small number of targets with that score. It is not possible to get uniform pool sizes from a starting guess but, sometimes when the target pool is small, a particular guess may be able to split the remaining k targets into k distinct scores.

In particular, this always occurs when there are only two targets left.

The guesses can be chosen to minimize the expected pool size but this strategy is not as effective as is maximizing the entropy.

julia> wrdle2 = GamePool(collect(readlines("./data/Wordletargets.txt")); guesstype=MinimizeExpected);

julia> ngwrdle2 = [length(playgame!(wrdle2, k).guesses) for k in axes(wrdle2.active, 1)];

julia> barplot(countmap(ngwrdle2))
     ┌                                        ┐ 
   11                                        
   2 ┤■■■■■ 131                                 
   3 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 957   
   4 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 946   
   5 ┤■■■■■■■■ 224                              
   6 ┤■■ 42                                     
   711                                       
   83                                        
     └                                        ┘ 

julia> (n̄ = mean(ngwrdle2), s = std(ngwrdle2))
(n̄ = 3.624622030237581, s = 0.8578269827640186)

Game play as a tree

For a deterministic strategy and a fixed guesspool and set of validtargets the possible games can be represented as a tree.

For illustration, consider just a portion of the tree of Wordle games using the MaximumEntropy strategy. Games with targets ["super", "hobby", "mince", "goner", "watch"] are shown above. They can be combined into a tree as

julia> print_tree(tree(wordle, ["super","hobby","mince","goner","watch"]); maxdepth=8)
missing, raise, 2315, 5.87791, 61.0009
├─ 🟨🟫🟫🟫🟨, deter, 102, 4.37007, 9.23529
│  └─ 🟫🟫🟫🟩🟩, cower, 26, 2.74682, 5.23077
│     └─ 🟫🟩🟫🟩🟩, hover, 9, 1.65774, 3.44444
│        └─ 🟫🟩🟫🟩🟩, joker, 5, 1.37095, 2.2
│           └─ 🟫🟩🟫🟩🟩, boxer, 3, 0.918296, 1.66667
│              └─ 🟫🟩🟫🟩🟩, foyer, 2, 1.0, 1.0
│                 └─ 🟫🟩🟫🟩🟩, goner, 1, -0.0, 1.0
├─ 🟫🟩🟫🟫🟫, tangy, 91, 4.03061, 7.48352
│  └─ 🟨🟩🟫🟫🟫, caput, 13, 2.4997, 2.84615
│     └─ 🟨🟩🟫🟫🟨, batch, 5, 0.721928, 3.4
│        └─ 🟫🟩🟩🟩🟩, hatch, 4, 0.811278, 2.5
│           └─ 🟫🟩🟩🟩🟩, latch, 3, 0.918296, 1.66667
│              └─ 🟫🟩🟩🟩🟩, match, 2, 1.0, 1.0
│                 └─ 🟫🟩🟩🟩🟩, watch, 1, -0.0, 1.0
├─ 🟫🟫🟫🟫🟫, mulch, 168, 5.21165, 6.85714
│  └─ 🟫🟫🟫🟫🟨, whoop, 6, 2.58496, 1.0
│     └─ 🟫🟨🟨🟫🟫, hobby, 1, -0.0, 1.0
├─ 🟨🟫🟫🟨🟨, sheer, 18, 3.28104, 2.11111
│  └─ 🟩🟫🟫🟩🟩, sober, 4, 1.5, 1.5
│     └─ 🟩🟫🟫🟩🟩, super, 2, 1.0, 1.0
└─ 🟫🟫🟨🟫🟩, binge, 25, 3.28386, 3.64
   └─ 🟫🟩🟩🟫🟩, mince, 2, 1.0, 1.0

Although this is not a particularly interesting tree, it serves to illustrate some of the properties. The first node, called the "root" node, is the first guess in all the games. The guess is "raise" with a pool size of 2315, an entropy of 5.88 and an expected pool size of 61.00 after scoring.

If the score for "raise" is 🟨🟫🟫🟫🟨, the next guess will be "deter", with the characteristics shown. If the score is 🟫🟫🟫🟫🟫, which is the most likely score for the first guess, the next guess is "mulch", and so on.

Note that in the tree the score is associated with the guess that it will produce next, whereas in the summary of the game the score is associated with the guess that produced it.

The reason that this tree is not very interesting is that it simply reproduces the game summaries, with the minor changes that the root node is common to all the games and the score tiles refer to the score that has been observed, not the score that will be observed.

It is more interesting to play a random selection of games

julia> print_tree(tree(wordle, Random.seed!(1234321), 12))
missing, raise, 2315, 5.87791, 61.0009
├─ 🟫🟫🟨🟫🟫, pilot, 107, 4.69342, 6.38318
│  ├─ 🟫🟩🟫🟫🟨, width, 13, 2.93121, 2.07692
│  │  └─ 🟫🟩🟫🟩🟫, bitty, 4, 1.5, 1.5
│  │     └─ 🟫🟩🟫🟩🟩, fifty, 2, 1.0, 1.0
│  ├─ 🟫🟩🟫🟫🟫, windy, 16, 3.20282, 1.875
│  │  └─ 🟫🟩🟫🟫🟩, fizzy, 2, 1.0, 1.0
│  │     └─ 🟨🟩🟫🟫🟩, jiffy, 1, -0.0, 1.0
│  └─ 🟫🟨🟫🟨🟫, comic, 4, 2.0, 1.0
│     └─ 🟩🟩🟫🟩🟩, conic, 1, -0.0, 1.0
├─ 🟫🟫🟫🟫🟫, mulch, 168, 5.21165, 6.85714
│  ├─ 🟫🟩🟩🟫🟫, bully, 6, 1.79248, 2.0
│  │  └─ 🟫🟩🟩🟫🟩, pulpy, 1, -0.0, 1.0
│  ├─ 🟫🟨🟨🟨🟫, cloud, 4, 2.0, 1.0
│  │  └─ 🟩🟩🟩🟩🟫, clout, 1, -0.0, 1.0
│  └─ 🟫🟫🟫🟫🟨, whoop, 6, 2.58496, 1.0
│     └─ 🟫🟨🟨🟫🟫, hobby, 1, -0.0, 1.0
├─ 🟫🟫🟫🟫🟨, betel, 121, 5.06266, 4.95041
│  └─ 🟫🟩🟫🟫🟨, cello, 9, 2.9477, 1.22222
│     └─ 🟫🟩🟩🟫🟨, felon, 2, 1.0, 1.0
│        └─ 🟫🟩🟩🟩🟩, melon, 1, -0.0, 1.0
├─ 🟨🟩🟩🟫🟫, dairy, 4, 1.5, 1.5
│  └─ 🟫🟩🟩🟩🟩, fairy, 2, 1.0, 1.0
│     └─ 🟫🟩🟩🟩🟩, hairy, 1, -0.0, 1.0
├─ 🟨🟫🟫🟫🟨, deter, 102, 4.37007, 9.23529
│  └─ 🟩🟩🟫🟫🟩, decor, 2, 1.0, 1.0
│     └─ 🟩🟩🟫🟫🟩, demur, 1, -0.0, 1.0
├─ 🟨🟩🟫🟫🟫, party, 26, 3.12276, 3.84615
│  └─ 🟫🟩🟩🟫🟩, carry, 4, 1.5, 1.5
│     └─ 🟫🟩🟩🟩🟩, harry, 2, 1.0, 1.0
├─ 🟨🟫🟫🟨🟫, short, 24, 3.60539, 2.25
│  └─ 🟨🟫🟨🟨🟨, torus, 1, -0.0, 1.0
└─ 🟨🟩🟫🟨🟫, satyr, 2, 1.0, 1.0

Again, the root is "raise", which is the first guess in any game using the MaximumEntropy strategy, and if the first score is 🟫🟫🟫🟫🟫 then the second guess will be "mulch". But now in this selection of games the guess after "mulch" was "bully", "cloud" or "whoop" in different games.

In other words some of the games from the 12 randomly selected targets overlapped in both the first and second guesses. Also, one of the games, for the target "satyr", got the target on the second guess.

A tree representation of all possible games can be written to a file as

julia> open("wordle_tree.txt", "w") do io
           print_tree(io, tree(wordle); maxdepth=9)
       end

but it may be more interesting to use some of the tools in AbstractTrees.jl to explore the tree itself.