Author: Randy Olson
Dr. Randy Olson is a Senior Data Scientist at the University of Pennsylvania, where he develops state-of-the-art machine learning algorithms with a focus on biomedical applications.

Why did so many Japanese families avoid having children in 1966?

Last week, I was presenting at a conference and discussing the merits of animated visualizations vs. small multiples. On one of my slides, I presented the following chart that shows the total fertility rate (i.e., the average number of children

Posted in data visualization Tagged with: , ,

Spurious Extrapolations: Novel and unique research abstracts

Last Christmas, BMJ published a funny article exploring the mentions of positive and negative words in research abstracts over the past 40 years. I’ve recreated their research for two of the phrases below — “novel” and “unique.” Your eyes aren’t

Posted in analysis, data visualization Tagged with: , , ,

Spurious Extrapolations: What if U.S. college tuition costs keep rising?

For this post, I’m going to test run a new post series called Spurious Extrapolations, where I extrapolate time series far beyond reason and envision what would happen if the trend continued. Let me know what you think of the

Posted in analysis, data visualization Tagged with: , ,

The correct way to use pie charts

Pie charts are the most widely berated chart in data visualization. Many articles have been written over the years describing why pie charts are bad, and why we should no longer use them. Even key members of the data visualization

Posted in data visualization, tutorial Tagged with: ,

Why posts get removed from /r/DataIsBeautiful

I’ve been a moderator of /r/DataIsBeautiful — one of the largest online communities dedicated to data analysis and visualization — for the past 2 1/2 years. During that time, I’ve reviewed thousands of data visualizations created by amateurs and professionals

Posted in data visualization, reddit Tagged with: , , ,

What data visualization tools do /r/DataIsBeautiful OC creators use?

One of the most common questions that newcomers to data [science/visualization/analysis] ask is: “What tools should I use to create data visualizations?” While I always recommend learning design principles before tools, I thought I’d take a stab at answering that

Posted in data visualization, reddit Tagged with: , ,

Major League Baseball home run leaders, 1871-2016

Earlier this week, a Reddit user shared a fascinating animated data visualization showing the MLB home run leaders from the past 200+ years. I found this visualization especially interesting because it was one of the few examples where I’ve seen

Posted in data visualization, python, tutorial Tagged with: , ,

Revisiting the vaccine visualizations

Last year, the vaccination debate was all the rage again. “Pro-vaxxers” were loudly proclaiming that everyone should get vaccinated and discussing the science behind it, and “anti-vaxxers” were casting their doubts and still refusing to get vaccinated for personal reasons.

Posted in data visualization, python, tutorial Tagged with: , ,

Analyzing MMA: The Ultimate Fighting Championship

For the past 7 years, I’ve been a fan of MMA, and especially the larger Ultimate Fighting Championship events that take place around the world. For the uninitiated, MMA fights pit two professional fighters against each other who often have

Posted in data visualization Tagged with: , ,

Introducing TPOT, the Data Science Assistant

Some of you might have been wondering what the heck I’ve been up to for the past few months. I haven’t been posting much on my blog lately, and I haven’t been working on important problems like solving Where’s Waldo?

Posted in machine learning, python, research Tagged with: , , , , ,