What data visualization tools do /r/DataIsBeautiful OC creators use?


Published on March 11, 2016 by Dr. Randal S. Olson

data visualization reddit tools

4 min READ


One of the most common questions that newcomers to data [science/visualization/analysis] ask is: "What tools should I use to create data visualizations?" While I always recommend learning design principles before tools, I thought I'd take a stab at answering that question by analyzing what tools the /r/DataIsBeautiful community uses.

For the uninitiated, /r/DataIsBeautiful is an online community dedicated to data analysis and visualization, where people post and discuss various data visualizations from around the web. Sometimes /r/DataIsBeautiful community members create and share their own data visualizations -- called "OC," or Original Content -- which I have always found to be a great source of ideas and inspiration.

As part of the /r/DataIsBeautiful posting rules, every OC contributor must include a comment on their post describing the data source(s) and tool(s) they used to create their data visualization. Thus, analyzing their tool usage over the years was a fairly simple n-gram analysis of all comments made by OC contributors on /r/DataIsBeautiful that mention the word "tool." For this article, I analyzed thousands of comments made by OC contributors to /r/DataIsBeautiful between January 2014 and January 2016.

DIB-tools-used-bar

(Unfortunately, it was difficult to parse out mentions of the "R" language with the n-gram analysis, so we'll have to use ggplot2 as a proxy.)

The most popular tools on /r/DataIsBeautiful are:

ToolFree?Requires programming?Typical uses
ExcelPaidNoBasic data analysis and visualization
PythonFreeYesGeneral-purpose scripting language that is typically used for data scraping, cleaning, and wrangling
D3.jsFreeYesJavaScript-based library for interactive data visualization on the web
TableauPaid, with limited free optionNoAdvanced interactive data visualizations for the web
ggplot2FreeYesAdvanced data visualization library for the R scripting language
RFreeYesScripting language designed for statistical analysis, modeling, and data visualization
matplotlibFreeYesPython-based visualization library for making basic data visualizations

As expected, Excel dominates the list as the primary tool that most beginners use: In this case, there have been at least 643 OC data visualizations on /r/DataIsBeautiful that were made with Excel. Excel is a great tool to start with, but you should eventually move on to more advanced tools that allow you to programmatically generate visualizations such as matplotlib/Seaborn, D3.js, or ggplot2. If programming isn't your forte, Tableau is a much better option than Excel.

Here's descriptions for the rest of the tools:

ToolFree?Requires programming?Typical uses
JavaScriptFreeYesScripting language for the web
HighchartsFree for non-commercial projectsYesJavaScript-based library for programmatically creating interactive data visualizations for the web; easier to use but less flexibility than D3.js
DatawrapperFreeNoBasic online interactive visualizations
GephiFreeNoNetwork visualization
PlotlyFreeNoWeb-based GUI for creating interactive data visualizations
CartoDBFree (limited)NoWeb-based tool for creating interactive online maps
SeabornFreeYesPython-based visualization library for advanced statistical data visualization
MatlabPaidYesPowerful analysis, modeling, and data visualization tool
Google ChartsFreeYesSimple JavaScript-based visualization library for creating interactive online visualizations
Leaflet.jsFreeYesSimple JavaScript-based visualization library for creating interactive online map visualizations
LaTeXFreeYesDocument preparation system that it somehow used to create visualizations (???)
Google FusionFreeNoWeb-based tool for creating interactive online data and map visualizations
BokehFreeYesPython-based visualization library for creating interactive data and map visualizations

I was also curious about temporal trends in library usage, so I grouped the tool mentions by year and plotted them below.

DIB-tools-used-line

GUI-based visualization tools such as Tableau and Gephi are seeing steady growth, whereas Python and matplotlib (oddly) seem to be waning in relative popularity. D3.js and ggplot2 are similarly experiencing steady growth, although I should note that 2016's counts are only based on January 2016's comments and may change by the end of the year. We'll have to revisit these trends come 2017.

Hopefully that answers all of your data visualization tool-related questions! If you have any more questions or concerns, please leave them in the comments.


How to download the comments I analyzed

If you'd like to repeat this analysis yourself, run the following SQL statement on the Google BigQuery database.

SELECT
  body,
  created_utc
FROM
  [fh-bigquery:reddit_comments.2016_01],
  [fh-bigquery:reddit_comments.2015_12],
  [fh-bigquery:reddit_comments.2015_11],
  [fh-bigquery:reddit_comments.2015_10],
  [fh-bigquery:reddit_comments.2015_09],
  [fh-bigquery:reddit_comments.2015_08],
  [fh-bigquery:reddit_comments.2015_07],
  [fh-bigquery:reddit_comments.2015_06],
  [fh-bigquery:reddit_comments.2015_05],
  [fh-bigquery:reddit_comments.2015_04],
  [fh-bigquery:reddit_comments.2015_03],
  [fh-bigquery:reddit_comments.2015_02],
  [fh-bigquery:reddit_comments.2015_01],
  [fh-bigquery:reddit_comments.2014]
WHERE
  LOWER(BODY) LIKE "%tool%"
  AND subreddit == "dataisbeautiful"