Celebrating 2 years of research blogging by analyzing my blog

In May 2012, I started this blog to rave about the IPython Notebook, a new scientific computing tool that’s still an integral part of my research workflow today. Two years have passed, and I’ve written about a breadth of topics ranging from statistics tutorials to science outreach to my PhD research to evolution to chess… and even the world’s deadliest actors. This blog has proven to be an incredibly useful outlet for fleshing out ideas and projects that I never would’ve finished without it, and even better, it’s connected me with hundreds of brilliant folks who have given me thoughtful feedback about my writing and projects.

Coincidentally, this blog’s 2nd birthday was also marked by another milestone: It just hit 1,000,000 pageviews! Given this site’s recent reputation as a data analysis blog, I figure the only appropriate way to celebrate its birthday is to analyze the blog itself and how it reached this milestone. Without further adieu…

Where did all that traffic come from?

My blog only has about a dozen subscribers, so it’s not like I have a dedicated following checking in every day. 90% of my traffic is referrals from other web sites, the majority from reddit. In the graphs below, I broke my traffic down into a few categories of my largest traffic drivers, with news outlets, other blogs, etc. grouped into the “Other” category. The first graph excludes reddit so you can see the early trends.

rso-cumulative-sessions-referrer-no-reddit

My blog went mostly ignored for the first 5 months, yet for some reason I kept writing and publishing blog posts to an empty audience. I set up a Search Engine Optimization (SEO) widget in September 2012, which had an immediately noticeable effect on my search engine traffic through Google. Search engines have been a reliable source of traffic over the years, so bloggers to take note: If you want to be found, set up SEO on your web site!

Some of my blog posts started getting picked up on various news outlets in late 2013, which led to a bump in the “Other” category. Twitter and Facebook have been a decent source of traffic over the years, but even they’ve been outpaced by search engine traffic. (And I have a decent-sized Twitter following to tweet to!) And all of them pale in comparison to…

rso-cumulative-sessions-referrer

… reddit. I’ve been an active redditor for over 3 years now, and I’ve been posting my blog posts there whenever I found an appropriate subreddit to see if anyone finds them interesting. Ever since I started blogging about my data visualization work, the traffic from reddit has exploded, some days reaching 115,000 pageviews in a single day. Social media folks are always talking about how much traffic Twitter, Facebook, etc. can drive, but my experience has always been that reddit is much better for finding people to share and discuss your work with.

Another phenomenon you’ll notice is that spikes in reddit traffic lead to spikes in the other source’s traffic shortly thereafter. The spike in reddit traffic in early January was from a couple of my “deadliest movies” blog posts reaching the front page of /r/movies and /r/dataisbeautiful. The day after, news outlets picked up on the posts and started writing about them, and that’s when Twitter and Facebook started to pick up on them. reddit seems to be a springboard for new, original content that wouldn’t be found without it — which is why I love reddit so much. (Thanks for giving me a voice, reddit!)

Who reads the blog?

I don’t have much information about my blog’s readers, but here’s what I have.

Geographic location

Half of my blog’s visitors are from the USA, leaving the other half to be filled by the rest of the world (mostly Canadians and Europeans). It’s been a goal of mine to have at least one visitor from every country, and as you can see, I only have a few countries left to fill. I’m especially proud of the fact that someone from Svalbard (population = 2,642) visited my blog.

rso-traffic-world-map

Internet speed by country

Given that my blog has had visitors from all over the world, it’s a fun exercise to compare who has the fastest internet speeds. Below are the top 10 fastest and slowest countries that had visitors to my blog. I only included countries that had at least 1,000 visitors to my blog.

I’m pretty shocked that the USA doesn’t show up in the top 10 fastest, given that my blog is hosted in the USA.

Countries with the fastest internet speeds

Countries with the fastest internet speeds

Countries with the slowest internet speeds

Countries with the slowest internet speeds

Browser popularity and speeds

This analysis would not be complete without a comparison of browser popularity and speeds. You can tell that my blog attracts a more tech-savvy crowd, given that over half of its visitors use Chrome. Despite Chrome’s claim to being one of the fastest browsers out there, it’s been left in the dust by Safari of all browsers! And poor Internet Explorer still can’t keep up with the rest of the modern browsers.

rso-browser-pop

rso-browser-speeds

OS popularity

Even though its native browser didn’t hold up in the browser race, Windows is still the dominant OS out there. But notice how Apple products are catching up with Windows in terms of market share?

rso-os-pop

What are the most popular posts?

It’s hard to believe that I’ve published over 60 blog posts over the past 2 years (an average of 2.5/month!). Below is a list of the most popular ones, sorted by pageviews.

  1. 114,031 – A data-driven exploration of the evolution of chess: Popularity of openings over time
  2. 106,683 – It’s impossible to work your way through college nowadays
  3. 99,603 – Top 25 most violence packed films of all time
  4. 72,889 – Top 25 deadliest actors of all time by on-screen kills in movies
  5. 69,513 – Retracing the evolution of Reddit through post data
  6. 47,106- Statistical analysis made easy in Python with SciPy and pandas DataFrames
  7. 44,919 – Top 25 most murderous directors of all time
  8. 44,911 – Programming Language Breakdown for the HealthCare.gov Website
  9. 43,427 – Chess tournament games and Elo ratings
  10. 41,592 – It’s impossible to work your way through college nowadays, revisited with national data

Thanks to everyone for your support over the years. Here’s to another 2 years!

Dr. Randy Olson is a Senior Data Scientist at the University of Pennsylvania, where he develops state-of-the-art machine learning algorithms with a focus on biomedical applications.

Posted in analysis, data visualization Tagged with: , , ,
  • kurtisbaute

    Great post!

    I just started a blog about science communication in late March, and I was always very happy if I hit over 100 page views in a day…

    Two days ago I posted a link to reddit for the first time (r/dataisbeautiful/), and was totally shocked to see that I’ve had >3000 page views since then.. It is definitely a better way for me to reach a suitable audience. I would love to see data comparing usage stats for social network sites (reddit, facebook, twitter, etc)..

    Anyway, thanks for sharing this data. This sort of data is interesting but hard to come across.. It shows that it is normal to have an initial lag time, and is encouraging to new bloggers like myself 🙂

    Keep it up, Randy!
    Kurtis

    • Hey Kurtis, happy to hear you found this post useful! 🙂 You should also check out /r/EverythingScience and /r/AskScienceDiscussion — both are great places to chat and post things about science outreach.

      • Kurtis Baute

        Great tips! I am pretty new to reddit, and hadn’t subscribed to those yet. Thanks!!

  • Chris White

    What do you mean by “set up SEO on your web site”? That search box on your website? Google already has your sitemap, no?

    • Search engine optimization: https://en.wikipedia.org/wiki/Search_engine_optimization

      Basically, setting up your web site so it shows up higher on search engines such as Google. Part of this is associating specific keywords and phrases with each page on your web site, so Google knows which keywords to attach your web pages to.

      • But how, specifically? Google can pretty much decipher the content from this article without you having to do anything extra

        • If you look at the source of, e.g., this web page, you’ll see some meta data inserted by the SEO widget at the top:


          < meta name="description" content="Randy Olson celebrates his blogiversary by analyzing 2 years of web traffic to the blog." />

          < meta name="keywords" content="research blogging, blogiversary,reddit,traffic analysis,analysis,data visualization" />

          Search engines such as Google read this information and attach the keywords to the web page when they cache it. I’m sure Google has some smart algorithms to try to figure out what each page is about, but it’s much better to explicitly define the keywords yourself.

  • Hi Randy, great post.

    Can I ask which “Search Engine Optimization (SEO) widget ” you used?

    Looking for some Google love too,

    thanks!
    Tim

  • Armchair Scholar (@nofieldscholar)

    http://www.abs-cbnnews.com/focus/05/28/14/why-philippine-internet-so-slow

    Awesome to see that your data aligns with what a lot of other people say. The costs of having an internet connecting there is insanely high too.. at $50/month for a 6Mbps connection that’s largely unreliable (frequently dropped connections, random slowdowns, etc).

    Sorry for the off topic comment, was just fascinated by your data.

    • Amusingly, it took over a minute for that web page to load. 🙂