Small multiples vs. animated GIFs for showing changes in fertility rates over time

A couple weeks ago, Stephen Holzman shared an animated GIF on /r/DataIsBeautiful that caught my eye. The GIF showed the evolution of fertility rates of the U.S. and Japan between 1947 and 2010, which starts right in the middle of the post-WWII Baby Boom and follows the gradual decline of Japan’s fertility rates, which has led to somewhat of a population crisis for Japan.

usa-vs-japan-fertility

Although Stephen’s GIF is fun to watch — especially because the animation gives the appearance of waves rising and falling — I couldn’t help but be frustrated by the limitations of GIFs in data visualization. If we wanted to compare the fertility rates of 1980 and 2010, for example, we’d have to keep a mental snapshot of what the 1980 frame looked like for when the 2010 frame came around. Thus, comparisons of time points beyond a couple years are impossible with animated GIFs unless the viewer has photographic memory.

This drawback is the exact reason that small multiples were introduced to data visualization: If we’re comparing the same data in the same format between several different [times|treatments|countries|etc.], then we can visualize the data on the same scale and axes to make them easily comparable.

I’ve long been a proponent of small multiples over GIFs, so I took Stephen’s data (which is actually from the Human Fertility Database) and reworked it into small multiples. You can click on the image for a super-high-res version.

usa-vs-japan-fertility-rates-small-multiple

Each year gets its own plot — running from left to right — with both country’s fertility rates plotted. The total fertility rate for each year is annotated onto its corresponding plot, and color-coded according to the country. I plotted the x-axis tick labels to show the reader the age range of the plots, but only on the top and bottom rows to avoid too much repetition. Similarly, the y-axis tick labels only appear on the plots on the left.

Of course, the drawback of small multiples is that we no longer see the data in the same detail as we did with the larger plots. Out of necessity, each plot in a small multiples chart must be small, simple, and have few axis ticks, which can make small multiples a poor choice if you’re making a comparison where there has been little change.

We can compensate for this by subsetting the data. After all, 64 years is quite a lot of data to show in one graph. What if we just looked at every five years?

usa-vs-japan-fertility-rates-small-multiple-subset

Now it’s straightforward to compare across and within decades: 1947, 1955, 1965, etc. can easily be compared by looking down the column. By the same token, 1947 and 1950 can easily be compared by looking down the row. We still get about the same level of detail as the GIF, and maintain the overall trend of declining fertility rates in both countries as time goes on.

From this chart, two major trends that are readily apparent from the data:

1) Both the USA and Japan have experienced declining birth rates since the 1940s — Japan moreso.

2) In the past 20 years, Japanese couples have started having children later in life (after their 30s) — so much so that in 2010, half the children born were born to parents older than 30.

Which begs the question: Why show all this data if we only have two points to make?

Simplifying the charts even more

If the above two trends are all we wanted to show with the data, then we can simplify the charts even more by calculating summary statistics and plotting those instead.

total-fertility-usa-japan

avg-age-birth-usa-japan

These charts take away the opportunity for the reader to glean any additional insights from the data. However, if we wanted to tell a straightforward story with charts, these would be the best ones to use.

Conclusions

  • Animated GIFs, while flashy, often make it more difficult to gain insight from data.
  • Static charts, such as small multiples, can simplify animated GIFs to make trends in the data more apparent.
  • Sometimes it’s better to calculate summary statistics and plot those instead, especially if showing all of the data does not lend additional insight.

If you liked what you saw in this post and want to learn more, check out my Python data visualization video course that I made in collaboration with O’Reilly. In just one hour, I will cover these topics and much more, which will provide you with a strong starting point for your career in data visualization.

Code for the small multiples visualization

I can’t share the data that I used to create this visualization — you’ll have to download it from the Human Fertility Database — but I’ve provided the Python code I used to generate the small multiples visualization below for education purposes.


import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

plt.style.use('https://gist.githubusercontent.com/rhiever/d0a7332fe0beebfdc3d5/raw/'
              '223d70799b48131d5ce2723cd5784f39d7a3a653/tableau10.mplstyle')

# "japan_fertility" and "usa_fertility" are pandas DataFrames with
# the fertility data from the Human Fertility Database

plt.figure(figsize=(12, 16))

for plot_num, ((index_japan, japan), (index_usa, usa)) in enumerate(
                                                    zip(japan_fertility.groupby('Year'),
                                                    usa_fertility.groupby('Year'))):
    ax = plt.subplot(8, 8, plot_num + 1)
    plt.fill_between(usa.Age.values, usa.ASFR.values, color='#1f77b4', alpha=0.7)
    plt.fill_between(japan.Age.values, japan.ASFR.values, color='#d62728', alpha=0.7)
    plt.xlim(9, 51)
    plt.ylim(0, 0.3)
    
    if index_japan <= 1954 or index_japan >= 2003:
        plt.xticks(range(10, 51, 20), fontsize=10)
    else:
        plt.xticks(range(10, 51, 20), [''])
        
    if plot_num % 8 == 0:
        plt.yticks(np.arange(0.1, 0.31, 0.1), fontsize=10)
    else:
        plt.yticks(np.arange(0.1, 0.31, 0.1), [''])

    plt.text(40, 0.26, usa.ASFR.sum().round(2), fontsize=10,
             ha='center', color='#1f77b4')
    plt.text(40, 0.225, japan.ASFR.sum().round(2), fontsize=10,
             ha='center', color='#d62728')
    plt.title(index_japan, fontsize=10)
    
plt.tight_layout()
plt.savefig('usa-vs-japan-fertility-rates-small-multiple.pdf', dpi=300)

Note that I had to add the plot axis labels, the plot title, and a couple annotations manually.

Dr. Randy Olson is a Senior Data Scientist at the University of Pennsylvania, where he develops state-of-the-art machine learning algorithms with a focus on biomedical applications.

Posted in data visualization, tutorial Tagged with: , ,
  • barnettjacob

    It’s a massive improvement! Are you able to share the code you used for the plot?

    • Sure – I updated the post with the Python code.

      • barnettjacob

        Perfect. Thanks.

  • John

    I actually prefer the gif. It keeps me focused. If I wanted to make a comparison between 2 years, that small multiples visualization gives me vertigo.

    • How about the second small multiples? I think the issue with the first small multiples version is that there was too much data on the chart at once.

      • John

        What I like about the gif is that the frame is static and it is the data itself that provides the motion. After a couple of cycles I am able to comprehend the direction of the curves. Sometimes it’s easier just to sit back and absorb rather than lean forward and seek.

    • If there was a video (instead of gif) with seekbar (and ticks as years, which is possible), I guess it would be better.

  • Henk Doorlag

    you could add the births per women to the average age graph as line thickness (or stacked area chart) . All pertinent data in 1 graph.

    gif is pretty, but raw dataisbeautiful

  • Stephen John Holzman

    I think that the world needs static charts AND moving charts. Thinking about an intended audience is critical when designing an appropriate visualization or set of visualizations.

    Presenting time series data in an academic journal or at a conference, the appropriate approach is almost always going to be small multiples or multiple line graphs. The audience is in the business of really digging in to the data being presented and they aren’t going to have time to loop a frustrating GIF or video 100 times. Maybe an interactive visualization would work, or even better two slider charts next to each other for making custom comparisons.

    However if I am going to a high school to make a presentation, I’m leading with movement 100% of the time. The world is distracting and full of cat videos. Most people have likely never taken the time to consider age-specific fertility rates because there are honestly far more interesting things, even though understanding how the world works and the change in rates over time is important. The GIF is hypnotizing and gets people to lean in when they would otherwise move along.

    Dataisbeautiful is obviously a uniquely diverse audience. Posts are likely to be seen by PhDs and high school sophomores. My current thoughts are that it is no great sin to post a GIF in such an environment so long as supporting static charts are linked at least in the comments, and providing the necessary visualizations to satisfy both audiences is something I think people (myself included) should do more of.

    • Robert Lesser

      Keep in mind that Dataisbeautiful is now a default sub on reddit, so “flashier” things like gifs and videos are more likely to stick out in millions of people’s frontpages.

  • Randy,

    This is a great post and you make a good point (visually, of course!) about the pros and cons of animation.

    There has been a lot of work in the realm of cartography regarding animation and small multiples. Some key literature in that area comes from Mark Harrower, Amy Griffin, Sara Fabrikant, and Carolyn Fish.

    If you read anything on this topic, I would recommend starting with Carolyn’s paper w/ Kirk Goldsberry and Sarah Battersby on the concept of change blindness: http://thecartofish.com/FishGoldsBatts2011.pdf

    The key thing she hits on is how animation introduces complexity through classification: the instant even a simple 3-class map animates, it essentially becomes a 9-class map.

    For example, imagine you have 3 categories: low, medium, and high. That would be simple to interpret in static form. But animation introduces transitions, which means the user now has to interpret change from high -> low, high -> medium, medium -> high, medium -> low, etc—in addition to the cases where no change happens (e.g., high remains high).

    The longer that animation is, the more complex that already complicated task becomes.

    One of the most interesting things that come out of the usability literature is how much people report *liking* animation, despite objectively performing worse in analytical tasks. People aren’t aware they perform poorly with animations, but their enjoyment of animation introduces a bias that makes them think they did well.

    That boils down to what some call “change blindness blindness:.” It’s a very fascinating topic.

    • Thank you for pointing out some literature on this topic, Joshua! The phenomenon you wrote about seems to be exactly what we’re experiencing here. Many people seem to enjoy the animated GIF version more, yet it’s likely they didn’t spot the trends as clearly as with the small multiples version. Very interesting indeed.

      • jugito

        Randy, why not animate the trends, then? In sales analysis I used to test the strength of our marketing and product life cycle by evaluating the 13-week rolling average of sales dollar amount overall and by product and overall. That eliminated most seasonal trends from the results.

  • jugito

    Try plotting US fertility by race