The science behind Wordle

Ido Frizler
5 min readJan 10, 2022

If you’re on Twitter, you’ve probably encountered those green & yellow squares popping up everywhere. Wordle has become quite a phenomenon over the past few weeks, conquering the feed as a simple, fun, addictive mini-game.

Like many others, I was fascinated in finding out if there’s an optimal way of playing the game, starting from finding an optimal first guess. This has led me down a rabbit hole of interesting challenges, from math, through algorithms, computer science and advanced engineering concepts. All, detailed below in the journey to write a bot that wins at Wordle.

As an added bonus — what is really the best word to guess playing Wordle? It is not what you think.


I wrote a simple Python program that loads all 5-letter words (there are some sources; my benchmark is with the list of Wordle app itself — 2315 puzzle words, 10657 valid words — extracted by some folks online). I then wrote another class that tries to guess a word each turn, making decisions based on the received inputs (green, yellow, etc.).

All that was left was to define the success function which I chose to be: (a) number of correctly guessed words and (b) the average turn length for successful guesses.

Gen 1 — letter frequencies

My first try for a bot was a rather naïve one — simply count in how many words does each letter appear, and choose the word with the aggregate highest number on its five letters. I decided to not count duplicates, which were quite a hassle given all edge cases.

The results of this bot were as follows:

Total games: 10657
Successful games: 9321 (87.4%)
Average turns when successful: 4.27

Not too bad, but lots of room for improvement.

Gen 2 — information gain (approximated)

First revelation was that counting the highest frequencies is a good strategy, but not always the best one. The reason it failed to guess many words is that sometimes the best strategy was to guess other letters (even ones comprising a word that is already ruled out), to narrow down our options.

So, I turned into building a function that aims to calculate the information I would gain from each guessed word. It was also based on letter frequencies , but now also taking into account known yellow and grey squares, with some score penalty.

Gen 2.1 — tackling words with common structure

Still, words that I kept missing were mostly ones that followed a similar pattern, like *AUNT. The reason I missed those is that once I discovered the last four letters, the bot would then iterate the remaining words one by one, which often were too many for the bot to succeed (in this example, think of DAUNT, JAUNT, TAUNT, …)

For that, I added an important improvement: look at disqualified words as well, because they might be better for getting more clues. For example, guessing a word with one excluded letter, but four other common letters, can give us lots of information. That actually helped a lot.

Gen 2.2 — some very much needed improvements

In developing this bot, I had to come up with many improvements, such as automatically marking “grey” letters that I did not guess, but simply do not appear in any of the remaining words. These small additions helped narrow down more and more, until I got a pretty good result for an approximated algorithm.

Most sophisticated manual bot yielded the following:

Total games: 10657
Successful games: 10263 (96.3%)
Average turns when successful: 4.26

Gen 3 — entropy-based information gain

I felt that incremental improvements were getting ineffective, and was eager to “solve it for real” by finding the mathematically optimal solution. Then, my wife came up with the winning formula:

The way to know the best strategy each turn is to look at the distribution of each of the potential outcomes of the guess. The option that will have the most balanced distribution (reducing entropy by the highest value), is the one where we gain most information and narrow our dictionary most efficiently (on average).

We have 5 squares and 3 options for each one. That results in 243 possible outcomes after a potential guess. For each guessable word, we find the probability of it mapping to any of the outcomes (some may be zero). We can then take these probabilities, and calculate entropy. The word with maximal entropy is the most profitable guess each turn.

Entropy is defined as below:

Entropy — for calculating information gain each turn

Surprisingly, or not, this bot strategy is able to guess all words in Wordle dictionary in up to 6 turns. It works flawlessly.

Total games: 2315
Successful games: 2315 (100%)
Average turns when successful: 3.54

Gen 3.1 — not such a trivial, finite problem

So, we covered math, but where are the engineering challenges, you ask? Well, if you paid attention to the above, you could see the results for entropy-based bot where for a dataset of “only” 2315 words. Why is that?

First, because these are the actual words in the current Wordle app. They let you guess out of a wider range, but they use only these 2315 words. And we got them all right!

Second, because running for all 10657 words simply didn’t scale. Why is that? Mostly, because going over all 243 branches each turn, doing O(n²) operations to find all those probabilities, was just too heavy for my naïve Python program.

Improvements I have lined up:

  • Dynamic programming — pre-process the pattern between all word pairs, so when calculating the entropy, we can get much quicker results.
  • Find out optimal first guess per given dictionary — this is the most CPU-intensive step, as the dictionary is widest. To simulate the bot running on all possible words, we don’t need to find out that optimal word each time; we can do it once, and start the game the same way for every run.
  • Moving to pandas data structures — they are built for these kinds of calculations and will speed things up.

So, what is the optimal word to start with?

The data from the entropy-based bot says “SOARE” is the best word to start with, statistically. This is the word that will give you the most information, and narrow down your dictionary (on average) to the shorter one possible. Give it a try and let me know :)


Code can be found in the link below. It’s still quite raw and messy, as I’m experimenting with it, but one can clone and start playing with it (or even open a PR if you’re daring).