The correct way to use pie charts

Pie charts are the most widely berated chart in data visualization. Many articles have been written over the years describing why pie charts are bad, and why we should no longer use them. Even key members of the data visualization community consider using a pie chart equivalent to using incorrect grammar.

In short, pie charts are the comic sans of data visualization: Many would agree that if you use a pie chart, you immediately betray your lack of data visualization knowledge.

I believe that the reason so many people dislike pie charts is because they’re so frequently misused by amateur practitioners. Furthermore, I believe that pie charts still play an important role in data visualization, and I’m going to make my case in favor of pie charts below. Along the way, I will explain the correct way to use pie charts; please take note and share this information with your colleagues so we can salvage the pie chart’s reputation.

The advantages of pie charts

From my point of view, pie charts have two major advantages over their alternatives:

  1. Pie charts are easy to understand. Even readers who have never taken a statistics course can look at a pie chart and immediately understand what it’s trying to show. This is a vital factor if you are making data visualizations for public consumption.
  2. Pie charts easily communicate a simple proportion. If all you need to communicate is that one category (or the sum of a few categories) represents a simple proportion of a whole, then pie charts will excel at this task.

I will demonstrate these points with a few examples below. First, let’s cover the three most common mistakes that designers make when using pie charts.

Make sure your parts sum to a meaningful whole

By far, the most common mistake with pie charts is representing parts that don’t sum to a meaningful whole. “What is a meaningful whole?,” you ask. Let’s make this concrete with an example.

Say we’re putting together a presentation for our boss and want to demonstrate the popularity of three programming languages as indicated by their Google search frequency. We make the pie chart below and move on with our slides.

What’s wrong with this pie chart?

pie-chart-not-meaningful-whole

That’s right: R, JavaScript, and Python don’t represent every programming language out there, yet by making them the only “pieces of the pie,” that’s exactly what we’re claiming! In other words, our parts don’t sum to a meaningful whole that represents all Google searches for all programming languages.

To fix this problem, we have to add an “Other programming languages” part to the pie chart, as we have below. Now we’re properly visualizing the relative popularity of our three programming languages.

pie-chart-correct

With the pie chart above, we can easily make the point that R, JavaScript, and Python together represent a little less than 1/4 of all Google searches for programming languages.

Note that we’re using the pie chart to make statements about simple proportions of the whole: 1/4, 1/3, 1/2, etc. are fine as simple proportions, but don’t use pie charts to communicate a specific percentage, for example, 32.3333%).

Collapse categories down to 3 or fewer categories that matter

Now let’s say we want to provide a broader picture of all the programming languages that people search for on Google. We make the pie chart below to show the percentage breakdown for all of the programming languages that we’ve been tracking.

What’s wrong with this pie chart?

pie-chart-too-many-categories

Right again! We went overboard and showed way too many categories at once. Pie charts are not designed to communicate multiple proportions, so we should collapse our categories down to only a handful that really matter — 3 or fewer is the general rule of thumb.

This is where we really have to think about the message that we want to communicate with this pie chart. Do we really want to show all of the programming languages, or do we want to only focus on a few of them? In this case, we decide that we really only care about Java, PHP, and Python, and collapse the other categories into an “Other” column.

pie-chart-categories-collapsed

Now we have a very clear message with our pie chart: Java, PHP, and Python together represent nearly 1/2 of all Google searches for programming languages, and Java represents roughly 1/4 of all searches alone.

Note that we shouldn’t try to use the pie chart to compare between Java, PHP, or Python: Pie slices are notoriously difficult to compare directly, especially if we ask our uncle who always takes the bigger slice of pumpkin pie during Thanksgiving dinner. If we want to compare the proportions, then we should use a bar chart instead.

Always start your pie charts at the top

One final note: Our readers will typically start reading our pie charts from the top of the circle — the 0‎° mark. We should never violate our readers’ expectations by starting the parts at any other section of the circle, even if it makes the pie chart look like Pac-Man.

pie-chart-misaligned

Fortunately, most data visualization software starts pie charts at the 0‎° mark. But in case you find software that doesn’t: you’ve been warned!

Recap

Now that we’ve walked through a few visualization exercises, let’s recap what we’ve learned about pie charts.

  1. The parts must sum to a meaningful whole. Always ask yourself what the parts add up to, and if that makes sense for what you’re trying to convey.
  2. Collapse your categories down to three or fewer. Pie charts cannot communicate multiple proportions, so stick to their strengths and keep your pie charts simple.
  3. Always start your pie charts at the top. We naturally start reading pie charts at the top (the 0‎° mark). Don’t violate your reader’s expectations.

Pie charts are useful for representing a simple proportion of a whole, and can easily be interpreted by expert and novice alike. Keep these tips in mind the next time you need to communicate a simple proportion to a general audience.


If you liked what you saw in this post and want to learn more about data visualization, come take a look at my Python data visualization video course that I made in collaboration with O’Reilly. In just one hour, I will cover these topics and much more, which will provide you with a strong starting point for your career in data visualization.

Dr. Randy Olson is a Senior Data Scientist at the University of Pennsylvania, where he develops state-of-the-art machine learning algorithms with a focus on biomedical applications.

Posted in data visualization, tutorial Tagged with: ,
  • Richard Zijdeman

    Hi Randy, since I feel twitter is too limited to seriously open the debate, I’ve moved to this medium, but see [1] for the point of origin of our twitter discussion, where I honour your pointers, but still defend to not use pie charts at all.

    Given your citations, you are obviously aware of the reasons why NOT to use pie charts. So I won’t go into that. However, you reason that despite those arguments there is a special case, in which pie charts would actually make sense, namely:
    1. when we are interested in the relative size of an aggregated group of smaller elements;
    2. when the relative size of that group is easily to judge, namely close to ‘1/4, 1/3 or 1/2’.
    3. if you apply a special layout (i.e. start at 12 o’clock) and aggregate to no more than 3 categories.

    Ad 1.
    If you were really interested in the related size of an aggregate group, the ‘pie-way’ would be to just show a pie with two colours: one for the aggregrate group of interesting parts and the ‘other’ part. However, in that situation a bar-chart would always beat the pie in terms of accuracy, safe for your argument 2.
    However, your argument 2 theoretically applies in less than 25% of the cases. If we take a two percent boundary on all sides: your figure would work when the aggregate group is between: 23-27, 31-35, 48-52, 64-68, 73-77 and 98-100 percent, combined that’s less than a quarter of the cases (if you’d use a pie) or 24% (if you’d use a bar chart). However, I’m not convinced that your arguments are proper for the 1/3 and 1/4 instances, which would reduce the applicability to only 4% (48-52) of the cases.

    Ad 2.
    Basically my first point here is that bar charts always work, where pie charts work, according to you, only in the specific ‘1/2, 1/3, 1/4’ instances. Hence as a result we are more used to reading bar charts than pie charts and thus, assuming, better at it.
    But I also disagree with your argument. For me the ‘1/2′ is really easy to spot. and I concur that, if there were no numbers on the axis of a bar-chart, it would be easier to spot whether something is just below or just above 50%. However, obviously there are always numbers on the axis of a barchart. Now the 1/4 instance appears easily to spot as well. However, and this may be a personal quirk since I like analog watches, your second pie, reads to me that the aggregrate portion is ’12 minutes past 12’. Following your logic, 12 is smaller than 15, (the quart of 60), and thus the aggregate group is smaller than 25%. However, in a bar graph I could have read that it is actually 20%. So my judgement was spot on (12/60=.2), but the pie provides me no feedback on that.
    With the 1/3, I really think you’re taking it too far. Ever tried to do divide a pizza in 3 slices? It will always be off by a few percent. And in the pie it will be difficult to compare the size of the 2 outer slices. Whereas this would be easily visible in a bar chart, even without axis.

    Ad 3. The 12 o’clock is indeed a must have and I’d say the default in most packages, so no extra work, whereas in stacked bar charts, you sometimes need to rearrange the factor levels (in R at least). The aggregation is a shame though, as it means loss of information and limits the use of this special case even further.

    So to conclude, recognizing the reasons why not to use pie charts, you argued for a special instance in which pie charts would be better than bar charts. I’ve argued that this special case is more special than you describe and only applies in a limited number of situations. Moreover, you would need to have a-priori knowledge on the distribution in order to decide whether your special case applies, whereas the bar chart would always work.

    I guess an important difference between our points of view is, whether you need to apply a graph, or read a graph. Your point of view seems to be more from the “reader’s” perspective. And I concur in the 50/50 situation there is nothing wrong with the pie. However, there’s also the application side of things (that you mention as “frequently misused by amateur practitioners”). My take is that the 4% of cases in which a pie-chart could be useful, is not worth all the misused cases that actually blur the reader’s perspective rather than enlightens it.

    [1] twitter.com/rlzijdeman/status/713113058843951104

    • Great writeup, Richard! Thank you for the feedback.

      I wonder if one of the major causes of our disagreement is the consideration of our audience. I suspect that, given your background, you often create visualizations meant to convey information to scientists. And to scientists, specific values matter: Whether the proportion is 12% vs. 14% can potentially change their opinion on a topic. On the other hand, when I think of pie charts, I am thinking of creating visualizations for the general public. 12% vs. 14% usually doesn’t matter as long as it fits the broader narrative of the visualization.

      For example, here’s a pie chart a student and I made to communicate that “over half of all Reddit posts go mostly ignored” (excuse the oval shape, heh): http://www.randalolson.com/2015/01/11/over-half-of-all-reddit-posts-go-completely-ignored/

      Yes, that same point could be made by transforming the data into a bar chart and stacking the “<1" and "1" categories. But reading that bar chart would require (at best) looking at the top of the "<1" + "1" bar, scanning over to the "50%" mark, and seeing that the bar is indeed above 50%. Meanwhile, the same message is inherent to the pie chart: the blue area + red area are greater than half the circle (er, oval), so our viewer doesn't even need to look at the numbers. I think situations such this one are much more common than you think, especially outside of science.

      Furthermore, I don't agree that bar charts are used more commonly than pie charts — at least not for the general public. I strongly suspect that if we sampled 1,000 average citizens, that they would be more comfortable and familiar with reading a pie chart than a bar chart.

      Finally, I believe that it's a bit of a slippery slope argument to say that we shouldn't use pie charts at all because some people misuse them. Some people misuse the Internet to do harm to others, but that doesn't mean we should all stop using the Internet, right? Your final paragraph is the reason why I think it's so important to educate everyone on the proper use of pie charts rather than trying to ban them altogether. Pie charts exist, people will use them regardless of what agreements we come to today, and so we (as educators) should do our best to share this sort of information with them so they use pie charts responsibly.

  • Eugene Woo

    I’m glad you support pie charts unlike some dataviz purists out there. For part of whole visualizations, the pie works better than the bar or column chart, especially if you limit the categories.

    • I think it’s become so much of a bandwagon topic that it’s difficult for people to admit that the pie chart does have its place. Just gotta keep spreading the gospel of the pie… 🙂

  • Good post, Randy! While I’m definitely in the “friends don’t let friends use pie charts” camp, I do agree that they’re widely used and if done properly (as you describe above) they can communicate effectively.

    The one thing I generally focus on, and I’m curious if you agree, is no matter what chart we choose, the message in the data should be totally obvious to anyone viewing it. That is, if we’re building a more explanatory view instead of an exploratory tool.

    Here’s an example I like to use for my students where I show a radial chart that is more of a single value instead of comparison. It’s super easy to understand and doesn’t try to do things it’s not good at, like show comparisons among discrete attributes.

    Also I’d love to hear about your experience sometime w/ O’Rielly. I hope this finds you well.

    Cheers!
    Ben

    • Hi Ben,

      Generally, I agree with your statement that data visualizations should have a clear message and that message should be abundantly clear from the visualization. However, I think there are also cases where it can be useful to build an interactive tool where the user is free to explore the data and come to their own conclusions. In both cases, I believe that it’s important to show enough of the underlying data so the user can come to the conclusion themselves of whether they agree with your statistics or not. For example, only showing an average of several samples without giving an indication of the distribution of the samples is an easy way to mislead — intentionally or not.

  • geoIndigo

    Stop trolling on data science