Here’s Waldo: Computing the optimal search strategy for finding Waldo

As I found myself unexpectedly snowed in this weekend, I decided to take on a weekend project for fun. While searching for something to catch my fancy, I ran across an old Slate article claiming that they found a foolproof strategy for finding Waldo in the classic “Where’s Waldo?” book series. Now, I’m no Waldo-spotting expert, but even I could tell that the strategy they proposed there is far from perfect.

That’s when I decided what my weekend project would be: I was going to pull out every machine learning trick in my tool box to compute the optimal search strategy for finding Waldo. I was going to crush Slate’s supposed foolproof strategy and carve a trail of defeated Waldo-searchers in my wake.

“But Randy, don’t you have better things to work on? You know, curing cancer, solving world hunger… ANYTHING else?”, a sane person would have said at that point.

Too bad that sane person wasn’t around.

What is “Where’s Waldo”?

For the poor souls who have no clue who Waldo is, I’ll defer to the Wikipedia description:

“Where’s Waldo?” is a series of children’s books created by English illustrator Martin Handford. The books consist of a series of detailed double-page spread illustrations depicting dozens or more people doing a variety of amusing things at a given location.

Readers are challenged to find a character named [Waldo] hidden in the group. [Waldo’s] distinctive red-and-white-striped shirt, bobble hat, and glasses make him slightly easier to recognize, but many illustrations contain “red herrings” involving deceptive use of red-and-white striped objects.

Here’s an example of a classic “Where’s Waldo?” illustration:

Here’s Waldo

Thankfully, the Slate article provided a chart that made it dead easy to acquire all 68 of Waldo’s coordinates in the primary 7 editions of the “Where’s Waldo?” books. I’ve reproduced those coordinates visually below. You can download the data file here.


If we perform a kernel density estimation of these points, we start to see some interesting trends already:

  • Waldo almost never appears in the top left corner. That’s because there was always some postcard from Waldo in the top left corner describing the setting and some interesting facts about it.
  • Waldo is rarely located on the edges. Slate’s Ben Blatt hypothesized that this was done on purpose because the edges are “locations that might be construed as too obvious” and are “where children and adults alike might begin their search.”
  • Waldo is never located on the very bottom of the right page. I was unsure about the reason for this at first, but Chris Metzger offered a probable explanation: Whenever you flip to the next page in a book, the bottom of the right page is the first thing you see. Thus, the bottom of the right page would be one of the worst places to hide Waldo because that’s the most-viewed part of the book.


Computing the optimal search strategy

Now on to the real fun! I decided to approach this problem as a traveling salesman problem: We need to check every possible location that Waldo could be at while taking as little time as possible. That means we need to cover as much ground as possible without any backtracking.

In computer terms, that means we’re making a list of all 68 points that Waldo could be at, then sorting them based on the order that we’re going to visit them. So now we just need to try every possible arrangement of the points and find the one with the shortest distance traveled. Easy, right? Wrong.

Those 68 points can be arranged in ~2.48 x 1096 possible ways. To provide some context, that’s more possible arrangements than the number of atoms in the universe. That’s so many possible arrangements that even if finding Waldo became an international priority and the world banded together to dedicate the 8.25 million computing cores from the world’s 10 largest supercomputers to the job, it would still take ~9.53 x 1077 years — about 6.35 x 1067x longer than the universe has existed — to exhaustively evaluate all possible combinations. (Generously assuming that each core could perform 10,000 evaluations per second.) In other words: if we don’t have a smarter solution, Waldo is as gone as Carmen Sandiego.

Thankfully, there are plenty of smarter methods for approximating the optimal search path for finding Waldo. Below, I visualized the best solution over time of one such method — a genetic algorithm — that found a nearly-perfect solution. As you can see, genetic algorithms continually tinker with the solution — always trying something slightly different from the current best solution and keeping the better one — until they can’t find a better solution any more.

(Note: Because genetic algorithms — like many optimization algorithms — are stochastic in nature, they won’t always result in the exact same solution at the end.)

After running the genetic algorithm for about 5 minutes, I ended up with the solution below. I colored the paths by whether they’re in the first (blue), second (orange), third (green), or final (red) 1/4 of the path. This path represents one of the shortest possible paths to follow on the page to find Waldo, so if we followed this path exactly, we’d most likely find Waldo much faster than someone following a more basic technique.

If you’d like to learn more about the methodology used here, see the accompanying methods and code document.

(For those interested: I also tried a standard hillclimber algorithm, but it always converged on a worse solution than the genetic algorithm.)


Of course, we should never take results from machine learning too literally. A robot might be able to follow this path perfectly, but I wouldn’t be able to remember that path unless it was etched on every page for me. Instead, I think we can take some general lessons from the path that the genetic algorithm discovered:

  1. The bottom of the left page is a good place to start. If Waldo isn’t on the bottom half of the left page, then he’s probably not on the left page at all.
  2. The upper quarter of the right page is the next best place to look. Waldo seems to prefer to hide on the upper quarter of the right page.
  3. Next check the bottom right half of the right page. Waldo also has an aversion to the bottom left half of the right page. Don’t bother looking there until you’ve exhausted the other hot spots.

I annotated the best solution with a general path to follow when searching for Waldo. If you don’t find Waldo at the end of that trail, then you’ve got an outlier and should check the middle of the pages or the top left and right.

How does this strategy compare?

Unfortunately, I lost my old copies of “Where’s Waldo?” ages ago in a move, so I couldn’t test it out for myself. I’d love to put this strategy to the test, though, to see how much faster it is than the Slate strategy.

The U.S. publishers of “Where’s Waldo?” generously sent me the entire series of books, so I was finally able to put this strategy to the test. As others have reported, the strategy works very well for most of the illustrations: I zoomed through most every illustration the first book spending <10 seconds on each illustration. The trouble is when an outlier illustration comes along. When you're on an outlier illustration, you not only waste time following the path, but then you're left disoriented trying to trace back and worrying that you missed Waldo. Waldo-spotting performance degrades on these outlier illustrations, so don't be discouraged by Book 1, which has 4 (!) outliers in it. Overall, in terms of speed, this machine-learning based method is streets ahead of the old Slate strategy.


This was all done in good humor and — barring a situation where someone puts a gun to your head and forces you to find Waldo faster than their colleague — I don’t recommend actually using this strategy for casual “Where’s Waldo?” reading. As with so many things in life, the joy of finding Waldo is in the journey, not the destination.

Dr. Randy Olson is a Senior Data Scientist at the University of Pennsylvania, where he develops state-of-the-art machine learning algorithms with a focus on biomedical applications.

Posted in analysis, data visualization, machine learning Tagged with: , , , ,
  • Kasper Allerslev

    This was really entertaining read, but I have to wonder; why on Earth did you opt for a genetic algorithm, when you can solve TSP’s with thousands upon thousand of cities by using subtour elimination?

    To date, the largest problem solved has 89.500 cities. The TSP might be NP-hard, but it’s a very solvable problem in practice.

    • No particular reason. I felt like coding a genetic algorithm last weekend, and it’s one of several possible solutions to the problem. 🙂

      If anyone feels up to it, I’d love to see what paths other methods come up with. IIRC the path I presented here had a total distance of ~59.

      • Kasper Allerslev

        I just read your bio, and it was suddenly kind of obvious why you’d go for it 🙂

        • That too. 🙂 EC is near and dear to my heart.

    • Luke

      Out of interest and because I couldn’t find anything quickly on google… Do you know how subtour elimination compares to heuristics such as Lin Kernighan?
      And how does subtour elimination work anyway?


      • Kasper Allerslev

        LK is basically a neighborhood search algorithm, making small changes to the current solution until none of the allowed “steps” yield a better one. It’s like a ball rolling down a hill until it stops in a hole, but you have no idea if it landed in the DEEPEST hole.

        Subtour elimination is pretty simple as a concept if you understand some basics in Mathematical Programming. You write up the TSP as an integer optimization problem, but ignore the subtour constraints. Then you solve it and see if there are subtours. If there are, you add a constraint that forces at least one of the nodes in a subtour to be connected to at least one of the nodes in another subtour.

        This means you will solve a lot of binary decision problems, but they all have in common that you don’t have many constraints. That’s the “feature” of the method, because the exact formulation of a TSP with n “cities” has 2^n subtour elimination constraints, but you only need an amazingly small fraction of these to compute the optimal feasible solution.

        So LK is really, really fast, and given a good starting solution, it will probably yield an excellent solution. Subtour elimination (which is also called a “row generation” algorithm) is much slower, but always yields an optimal solution. But in this case, “much slower” means next to nothing, because the time frame is already ridiculously small.

  • geotheory

    This is important science. KDE bandwidth maybe a little too big however (hence edge effects) – Wally’s distribution (sorry, that’s his name down our way..) seems pretty random aside from 2 empty regions.. 🙂

    • I wish you were on the review panel for my research grant to continue this work. I kept telling them that this is an important problem that requires a research institute to solve AT A MINIMUM, but they would hear nothing of it.

  • Pingback: How I Found the Optimal Where’s Waldo Strategy With Machine Learning | Nagg()

  • Pingback: How I Found the Optimal Where’s Waldo Strategy With Machine Learning | PJ Tec - Latest Tech News | PJ Tec - Latest Tech News()

  • Pingback: How I Found the Optimal Where?s Waldo Strategy With Machine Learning | Sports News()

  • Pingback: How I Found the Optimal Where’s Waldo Strategy With Machine Learning | KrazyWorks()

  • Nik Billings

    I think the reason Waldo was never on the edges of pages was to account for printing irregularities that could cause the puzzle to be unsolvable.

    • That’s a viable possibility. The Slate article mentions this as well. Pretty smart!

    • Mark

      In most professional printing scenarios, an eighth of an inch is the allotted margin—pun intended—for error. However, most printers are much more accurate than this. Nevertheless, Waldo could have been safely printed an eighth of an inch from the edge of the trimmed page.

  • agam_b

    Good stuff … out of curiosity, what did you use for these visualizations? What’s in your “data tinkering” toolset ? 🙂

    • I used Python’s matplotlib library for all of the visualizations except for the kernel density estimate. The KDE plot was created with Python’s Seaborn library. I highly recommend checking out Seaborn for easy statistical visualization.

      • agam_b

        Thanks! Been procrastinating by refusing to decide between Python/R/something else, you made it a bit easier …

        • Honestly, you should learn both. 🙂

          R is really great for statistics and modeling. Its libraries are still more advanced than Python’s statistics packages.

          Python is great for general programming and (most) everything else. I love Python because there’s almost always a free package out there that saves me from coding up a custom solution.

          I personally find Python easier to code in, but it’s invaluable to be able to turn back to R when I need to do some advanced statistical modeling.

  • rresrser

    oh, so you only got a near perfect solution. Doesn’t sound very optimal to me

  • AssHat900

    Finally world hunger has been solved.

    • Κώστας

      Same thing someone would have said back in the 18th century when C.Maxwell presented his famous 4 equations. And yet, many years after that here we are with our electric machines which give you the power to bitch about the not so usefulness of an algorithm you don’t know where else it might be applied in the future. Well done. Back to your cave now…

      • AssHat900

        Sorry I missed your comment earlier. I was a little harsh, it’s just that my mother was raped by math.

    • Segamaike

      Are you doing anything to solve world hunger right this moment? No? Then Kindly shut the fuck up.

      • AssHat900

        I throw my twenty fives pounds of turkey in the trash like everyone else. #SJW

  • Chris Metzger

    I have a theory on why Waldo is never in the bottom right of a spread based on my time as a design intern at a small magazine: Basically that’s the first area a person sees when turning the page. When turning the page, people tend to grab the bottom corner of page they’re on and lift from there. If Waldo was directly under that corner the reader might find him before even seeing the rest of the image.

    At my magazine, we used this to determine advertisement placement. Once you know you see it everywhere, generally ads are placed on the outside (left side of left page, right side of right page) of a spread and content is in the center. This is for a few reasons, but largely it’s because if someone is just flipping through the pages, say in line at the grocery store, there’s a better chance of seeing the ad which in turn funds the magazine. Along the same lines, advertising space on the right page costs more than on the left. This is because when a reader turns a page the next spread is revealed from the right edge to the left, so a righthand advertisement would be the first thing the reader saw.

    • That’s brilliant – thank you for the context. I’ll update the article to add this explanation.

    • Niels de Rooij

      Yeah, except that 99,99% of the people grab the top right of a page before turning it over.

      • SkippyFlipjack

        I like when people think that everyone does things exactly the way they do 🙂

        • Niels de Rooij

          Well okay, but same goes for all of us. I for one always grab the top corner of a page when flipping it over. Otherwise I might tear a page.

          • Leah

            yes I always grab top right corner as well. Perhaps not 99.99% of people, but I think a significant proportion do.

            • Niels de Rooij

              Yeah I might have over reacted a little there sorry 😛

      • Bottom right page turner right here.

        • Niels de Rooij

          I’m no xenophobe but that’s weird man!

  • Maxamillion Mansionhouse III

    I found this interesting.
    I wouldn’t know where to start or what the process could even consist of with trying to put something like this into reality that another person could follow.
    Thanks for doing this, I don’t think it was a waste of your time like you hinted to.

  • Isaiah DaProphet

    This is pretty smart.

  • disqus_tVhF5aQ8cN

    This is an “optimal search” for Waldos that have already been found. Big deal! It seems more in the spirit of Where’s Waldo to search for him without any prior information but what he looks like. Now that would be impressive.

  • SirMaxim

    I wonder what the results look like for his cane and other finds.

  • Neil

    This isn’t really a traveling salesman problem, as the goal is not to minimize the time/distance required to “visit” all the places Waldo could be, but rather to find Waldo as quickly as possible (and then stop) so your path should start in the densest area, not the upper left corner. Your optimization should multiply each segment length by the likelyhood that you will have to travel that segment before finding Waldo. Rather than minimizing (X1+X2+X3+…+X(n-1)), you should be minimizing (X1*(n-1)+X2*(n-2)+…+X(n-1)*(1))

    • Thaïti Bob

      that’s what I would say too. Start the serach on top of the big right hill, then continue to the lower peak and go down.
      Maybe a mix between the “downhill path” and the traveling salesman ?
      Is it what Randy tried using the hillclimber algorithm ?

      • Piers Newberry

        I loved the article, but, strictly in picky twat mode, it doesn’t take account of the two different eye speeds – search speed and flick speed. In this case it would be better to flick between the hills of high probability at some stage.

  • Ben Allison

    I like the post. I think you’re working on an interesting problem, and I think there is potential for a lot of interesting research, particularly where human deviations from optimal search are concerned. However, I have a couple of observations having done some work on visual search/eyetracking in images.

    You assert that optimality means the shortest distance travelled. I think that’s not correct—I would argue you want to cover as much of the page as possible in the fewest saccades (jumps)/shortest time, biasing towards locations where the target is likely to exist. Humans can percieve anything within a certain radius of their focal point with pretty good clarity (this region is called the fovea), with diminishing clarity radiating from that centre. This is normally approximated by a Gaussian centred on the focal point, which is similar to your KDE (assuming you’re using seaborn’s default of a Gaussian kernel).

    So I would argue that when fresh to the page, the optimal move is to the region with the highest probability of containing Waldo a priori. You can get that from your KDE. Conditioned on some search path in the image, you want to blend prior probability of containing Waldo with how well covered that region is by previous focal points. If you’re modeling humans, you probably also want to stick an inhibition-of-return style decay on past search events so you can get people to revisit the same location after an appropriate length of search with no luck.

    Fun research though, thanks for the post!

  • Abishek Hariharan

    Hey this was a neat adventure!

  • Linda

    What a super smart guy! Who would have ever thought to put imagination in a box and reduce it to computation? Well, I’m impressed. He has a strong future ahead of him – I hope he can find a cure for cancer. Well done Mr. Olsen. I like how you spend your free time :0)

  • André De Lange

    Quite interesting, BUT, doesn’t the ability to find waldo in this way rely on already knowing where he is..?

    • Xedecimal

      This is exactly what I came to say, if they made a new Waldo book this wouldn’t find a Waldo.

      • John Black

        Not at all true.

        The whole point of this article is to point out that there is a reliable consistency to where Waldo is generally placed in the books. A new book would, theoretically, place him in a similar position.

      • Leah

        that’s only if they changed their system for Waldo-placing. The map shows that a majority of Waldos tend to be placed in similar areas. Who knows whether this is a coincidence or they have some kind of algorithm or system. But it’s probably safe enough to say future editions would follow a similar pattern, unless they alter it in the light of this article.

    • No, why? I mean it operates using the books that have already been published, not really newer ones, but he was simply backtracing it from the known locations. He was offering a way to find Waldo efficiently, not as if you need to know where he is.

      • André De Lange

        Well, to use your own words: “…from the known locations.”

    • karthik nayak

      Yes it does, thats learning for you.

  • Awesome blog, Randy! Here is a follow up I wrote:

    • This is great! Thanks for sharing, Vitaliy! 🙂

  • peterofsunday

    Is it me or it looks a bit like Europe Map ? 😉

  • As Waldo says sadly,”I’ve always been here.”

  • David Turner

    I don’t get it. I found the little guy within a minute. Why do people find it so hard?

    • With this, it takes about 10 seconds, so you’re actually the one who seems to find it harder… And no one finds it hard.

  • calenti

    Really interesting post. Are there any introductory online classes you’d suggest for someone who was interested in learning more about topics like kernel density estimation and genetic algorithms? My browser’s display regularly explodes with ads for “Learn Big Data Now” – I’d prefer something basic and useful.

  • FishFlue

    I think it takes the fun out of asking your onboard – inboard computer just to find him. I think its accurate.

  • sergio morales

    I found this very interesting. I think there are plenty of applications for this strategy. I’m working with facility location problems and I’m thinking maybe there’s something i could learn from your work, Anyway, thanks for sharing 😀

  • Elizabeth Olson

    Isn’t the whole idea of finding Waldo to train a child’s brain to make it’s own algorithms?

    • I wouldn’t show this algorithm to kids — just let them think you’re the Waldo-spotting expert! 😉

    • Yeah, and that’s why this article isn’t geared towards children…

  • Patrick Man

    OP’s an orphan and his obsession with finding Waldo is a projection of his own fruitless search for his parents.

  • Harry

    I have yet to catch my wife with him, But I know she is diddling him. Waldo is an asshole. Let him stay lost.

  • Magnus Ytterstad

    Very entertaining read! I built an app for iOS that solves TSP’s with GA and in the latest version I included your where’s Waldo coordinates as a predefined problem. It comes up with a slightly different solution.

    The app is here:

    • Neat! Thanks for sharing Magnus. I’d expect a slightly different solution with GAs — even my GA produces slightly different solutions each run.

      How many cities can your GA reliably solve for?

      • Magnus Ytterstad

        I don’t know. I have set a limit on 100 cities for now, and at that number of cities I’ll have to run ~1300 generations to get reasonably good solutions. With uploaded predefined problems, one can set the number of cities higher than 100.

  • Harlan Tamba


  • ChrisR00

    Could this same algorithm be applied to solving jigsaw puzzles upside down? For example, if you partition the pieces (edges, 0:4 inner:outer connection, 1:3 inner:outer connection, etc.) is there an optimal searching pattern to put together a jigsaw puzzle in the shortest amount of time. I guess you should probably think about that cancer solving business first, however 🙂

  • chmduquesne

    You modeled this as a tsp, but I believe you don’t need to look for a tour here. I think you could look instead for the shortest Hamiltonian path. This is a different problem (although also NP-hard).

    Nice article. I enjoyed the reading!

About this blog

This blog is my labor of love, and I've spent hundreds of hours working on the projects that you'll read about here. Generally, I write about data visualization and machine learning, and sometimes explore out-of-the-box projects at the intersection of the two. I hope you enjoy my projects as much as I have.

If you would like to use one of my graphs on your website or in a publication, please feel free to do so with appropriate attribution, but I would appreciate it if you email me first to let me know.



Enter your email address to subscribe to this blog and receive notifications of new posts by email.