Design critique: Putting Big Pharma spending in perspective

Recently, /r/DataIsBeautiful began hosting weekly visualization redesign competitions challenging everyone to come up with better and less misleading designs of existing graphics. Below is the first of many redesigns and detailed critiques that I will be working on.


In early 2015, John Oliver and his team released an excellent exposé on Big Pharma and their shady marketing tactics. Shortly thereafter, Leon Markovitz from dadaviz released the following bubble chart to feed the ensuing anti-Big Pharma news cycle.

9-out-of-10-big-pharma-companies-spend-more-on-mar-1423570941.34-1027249

While this bubble chart visualizes an interesting phenomenon, several aspects of the chart can be improved to tell a more accurate and complete story. Below, I will outline and address three such improvements.

Better design through better chart choices

It has been shown time and time again that circles are terrible for making comparisons. Worse, the above bubble chart makes it even more difficult to use the circles to compare values by not even overlapping them. The visual representations of the data in this graphic are minimally useful, and most readers will simply rely on reading and comparing the numbers inside each circle, which renders the chart a fancy-looking data table.

Most basic guides for selecting charts recommend the use of bar charts for comparisons of data. Let’s rework the above chart into a bar chart.

big-pharma-revenue-spending-breakdown

The bar chart works much better than the bubble chart for comparing the company’s marketing and R&D budgets by placing them on the same axis. It also still allows the viewer to look up approximate budget numbers via the x-axis grid lines. However, the bar chart is also quite cluttered because it’s comparing 2 values for 10 companies.

This is where it’s important to think about the purpose of the chart. Leon wanted to use this chart to communicate the fact that “9 out of 10 Big Pharma companies spend more on marketing than R&D.” This fact can more effectively be communicated by a scatter plot, as I’ve demonstrated below. In the chart below, each square represents a company.

big-pharma-marketing-vs-rd-spending

The key to this scatter plot is the line running diagonally through the center of the chart, which represents parity between marketing and R&D spending. Now the viewer can immediately tell how many Big Pharma companies spend more on marketing than R&D: They need only count the number of squares above and below the line of parity.

As an added advantage, the scatter plot still allows the viewer to gauge approximate budgets for each company and allows for a third dimension of data — total company revenue in this case — to be visualized via the size of the squares. Even though the identity of the individual companies are lost in the scatter plot, this issue could be remedied by annotating the graph with the names, changing the squares to pictures of the company logos, or even turning the graphic into an interactive. I did not do so here because the company names are not particularly important to the story.

Don’t forget to normalize

Another basic mistake in the original bubble chart was that the data was not normalized in any way, making comparisons between the Big Pharma companies precarious.

Taken at face value, the non-normalized numbers seem to indicate that Johnson & Johnson is a marketing giant and far more invested in marketing than Astra Zeneca. These numbers completely ignore the fact that Johnson & Johnson brings in far more revenues than Astra Zeneca; when we take both company’s total revenues into account, Astra Zeneca actually spends a higher percentage of its revenues (28%) on marketing than Johnson & Johnson (24%).

Below, I normalized all of the expenditures by each company’s 2013 yearly revenues.

big-pharma-marketing-vs-rd-pct-spending

By normalizing the expenditures, the graph now tells a more complete story: We can meaningfully compare the Big Pharma companies and see that most of them spend about 15% of their revenues on R&D and 20-25% of their revenues on marketing, with Roche and Eli Lilly & Co. being the odd ones out sitting on the line of parity.

Provide meaningful context

Perhaps the most egregious oversight in the design of the original bubble chart was the failure to provide any meaningful context to the data. The viewer was left with the fact that “9 out of 10 Big Pharma companies spend more on marketing than R&D,” but many viewers don’t know if a large marketing budget is normal for a company or not. Left to their own devices, many viewers (especially those who watched John Oliver exposé) assumed “R&D good, marketing BAD” and immediately grabbed their pitchforks and aimed them at Big Pharma.

To provide at least some context to the data, I looked up the 2013 marketing and R&D budgets of 6 large companies and plotted them alongside the Big Pharma companies. The companies are:

  • Samsung
  • Intel
  • Microsoft
  • Google
  • Toyota
  • General Motors

These companies were picked based on the ease of looking up their budget and revenue information. Unsurprisingly, not all companies make this information readily accessible on the internet.

companies-marketing-vs-rd-spending

At least based on the companies chosen, it appears that Big Pharma as a whole is an outlier when it comes to marketing budgets. Even Samsung with its infamous $14bn marketing budget only spends ~8% of its revenues on marketing. The only company that even comes close to Big Pharma in terms of marketing is Intel, but it still spends more on R&D than marketing.

Perhaps the pitchforks over Big Pharma’s apparently overgrown marketing budget were warranted, but we didn’t know until at least some context was provided.

Conclusions

Well-designed data visualizations are one of the most effective mediums for communicating information today. We must be careful when designing visualizations to make sure that they tell the whole truth rather than bend statistics to tell the story we want to hear. In this critique, I have covered 3 common oversights that lead to bad and/or misleading visualizations:

  • Selection of a proper chart
  • Normalizing data
  • Providing meaningful context

Before sharing your visualizations in the future, please be sure to review your work to ensure that you didn’t hit one of these common pitfalls.

If you liked what you saw in this post and want to learn more, check out my Python data visualization video course that I made in collaboration with O’Reilly. In just one hour, I will cover these topics and much more, which will provide you with a strong starting point for your career in data visualization.

Dr. Randy Olson is a Senior Data Scientist at the University of Pennsylvania, where he develops state-of-the-art machine learning algorithms with a focus on biomedical applications.

Posted in data visualization Tagged with: , , , , ,
  • Nick

    Your updated scatter plots certainly paint a more complete story. The missing context in the original is certainly a problem but none of the recommended fixes are better at conveying information in a quicker, easier, or more visually interesting way than the original. You arrive at the same conclusion from any of the graphics… That pharma spends more on marketing than r&d, but the original conveys that in 3 seconds. The other options, asides from the bar graph, take 60 seconds to convey their message at which point the reader arrives at the same conclusion anyways…

    • I’m not sure how the scatter plot conveys the information any slower. Once it’s made clear that the line of parity means, discerning the companies that spend more on marketing than R&D is merely a matter of looking for companies that fall above the line of parity. Specific comparisons of values aren’t even necessary with the scatter plot — as they are with the bar and bubble chart versions — since the chart layout performs that comparison for you.

      • Nifer B

        But “line of parity” is not an easily conveyed concept to everyone (I’m going by my experience as a tutor here). If you have to start with that, it takes a lot longer. And again, you’re assuming that people will bother to read the chart in the first place. I will repeat to confirm what others here have said, which is that a more complete story is nice for some but useless for the majority who won’t let you tell the story at all if it does not catch their interest first.

      • Also worth pointing out that not only do 9/10 companies spend more on marketing than R&D, they spend significantly more (1.5 — 2x). To me, only the bar graph really shows that ratio, although it is the most cluttered of the three.

  • disqus_vE1aSde1ei
  • cody

    I would like to see R&D vs Marketing amounts by drugs under patent and vs Marketing amounts by years left on drug patents

  • Rory Rosen

    An interesting company to throw in there would be Redbull, as they spend a ridiculous amount on marketing when taken as a percentage of revenue.

  • Malistar

    Consider the audience though. While the final chart is more complete and more rich with information, it looks opaque to the average person and is far less visually pleasing.

    Bore people or confuse them and you lose them. Simple truth. In order to accomplish your objective in any audience more general than a technical university campus, data visualizations must have a strong component of visual interest and they must certainly be attractive. As it stands, I don’t think the scatterplot pulls this off. It would be fine for an academic audience or your own use of course, but Oliver’s team wasn’t going for that.

    • I’ll agree that the scatter plot in its current version is fairly boring to the eye. It could be made more visually appealing by replacing the squares with company logos and annotating the graph a little better. However, I also believe that if the story a graph is telling is interesting enough, even a boring graph can tell a compelling story. 🙂

      For example, here’s another fairly boring scatter plot with minimal annotations that nonetheless tells an interesting story: http://www.randalolson.com/wp-content/uploads/iq-by-college-major-gender.png

      • Brian Westley

        Is this from Lake Woebegon? All of them are above average…

        • Steve Estes

          Lots of dumb people don’t go to college, of course. But I do find it hard to believe that physics majors – across a wide spectrum of colleges – average a 133 IQ, more than 2 standard deviations above the mean. Maybe at ivy league colleges, but across the board? Their AVERAGE intelligence is above the 98th percentile of the population? I’d believe physics *professors*, but physics majors? Smells like BS there.

  • Johnathan Pertolick

    Critique: You did not compare Pharmaceutical companies budgets, you compared Massive Umbrella Corp budgets.

    For example: Johnson and Johnson, for whom appears to spend the most on marketing, is a huge corporation primarily split into Consumer, Pharmaceutical and Medical Devices.

    I can understand the “short hand” of combining Pharma and Med Devices under the label of “Pharmaceutical”, but Consumer divisions dramatically distort your data.

    Why? Because consumer divisions are full of products that require high marketing and zero R&D. Johnson and Johnson sells products like Listerine or Baby Powder which do not need further R&D spend, but which require huge marketing spend to keep consumers buying it on name power (see: CocaCola).

    I understand the value of pointing out the differences in R&D and marketing for a “Pharmaceutical corporation”, but I think you lead your viewers into making the same bad conclusions as the original author, because some of these companies have significant ad spend required to support their existing consumer and OTC businesses. We might conclude that J&J is the “worst” offender of the bunch, because we don’t have the basic information about their requirement to pump ad spend into household cleaning products for name recognition marketing. We confuse their spending on Listerine and think they’re spending that money promoting prescription medications, which means we’ve fundamentally misunderstood the basics of what the graph tells us.

    You should also include other “high spend / low development” companies like Nestle, Colgate-Palmolive or PepsiCo, to help us understand what effect this style of business has on the conclusions for non-pure pharma businesses.

    • Rod Clifton

      +1 on this…great points, would be interested to hear Olson on this

  • Jerry R. Gray

    I would be curious to see a longitudinal chart showing how marketing/R&D spend evolved over time, e.g. from when it wasn’t allowed, to when it was. Answering the question, “Was it an overnight change, or did it slowly evolve to the current state?”

  • Aun Awn

    Next, do the “bubble of money wasted” vs “bubble of R&D spent” for big government.

  • Rahul Sangole

    Yes, this is great advice. Scatterplots are quite powerful when used intelligently.

  • Nifer B

    I am not sure a scatterplot is actually the strongest choice in this context, as many of my friends with a liberal arts degree and a fear of math inculcated in bad high school experiences see anything that looks like a “graph, oooo scary math things” and their brain “turns off.” The “bubbles” of the original chart appear “friendly” to my eye, and most of us can process the numbers they represent–i.e., compare simple numbers with a few decimals, which most are comfortable with from coinage (as opposed to fractions, which fall back into the “scary, my brain is shutting off now” category, at least according to my mother every time she cooks). The “bubbles” chart may not be the most effective at conveying relative weights, but a more general audience may more easily get the big picture–there are more big orange bubbles than big blue bubbles!–without being required to get into the individual bubble to bubble comparison problem that you alluded to.

    tl;dr, the scatter plot may give a more accurate “feel” for the data, but only if you read it, and it might be too intimidating at first glance for many to bother to read in their non-work hours.

  • Kyle Gordon

    Great read, but the scatterplot that includes auto companies is misleading. Carmakers have massive sales and tiny margins. Representing their R&D/ marketing budgets as a percentage of sales is bound to make their expenditures look small and high techs’ look massive. I might try using earnings instead of revenue here.

  • David Weksler

    Thanks for the data presentation – I work with math teachers and technology – making things visual for students (and teachers) can make a huge difference. I guess I tend to look at these variations through two of my favorites – Edward Tufte and Hans Rosling (Gapminder). Good luck with your program at MSU.

  • JJSchwartz

    Isn’t this common knowledge?!

  • FamousGrouse

    Something like 50% of Pharma R&D is now handled through the university system, so those two numbers are much closer than the graphs realize.
    Wendy Warr and Associates is a good resource.

About this blog

This blog is my labor of love, and I've spent hundreds of hours working on the projects that you'll read about here. Generally, I write about data visualization and machine learning, and sometimes explore out-of-the-box projects at the intersection of the two. I hope you enjoy my projects as much as I have.

If you would like to use one of my graphs on your website or in a publication, please feel free to do so with appropriate attribution, but I would appreciate it if you email me first to let me know.

Archives

Subscribe

Enter your email address to subscribe to this blog and receive notifications of new posts by email.