The reddit world map

We can all agree that online social networks dominate most people’s day-to-day Internet lives. 90% of all U.S. adults aged 18-29 have a Facebook account, and a large portion of those people check their Facebook at least once a day.

What’s strange is that most people regard social networks as nothing more than a blob of status updates and links that occasionally has something interesting on it. These people rely on word of mouth or rudimentary search features to find interesting content on social networks, despite the fact that there’s often small communities focused on exactly what they want to talk about.

Last year, I set out to change all that. I wanted to connect people to these smaller communities.

The toughest part about online social networks is navigating them. With millions of users and hundreds of thousands of communities, how can we possibly hope to find the right community? That’s when I had the thought: We all use maps to navigate the real world. Why can’t we use maps to navigate online social networks?

Randall Munroe famously inked a high-level social network map back in 2007:

but his map wasn’t particularly useful for navigating individual networks.

Thus, I set out on an expedition to map out the untamed lands of my favorite social network — reddit.com — like a modern-day Lewis and Clark. Below are the results.

The reddit world map

After several months of web scraping, data tinkering, and fidgeting with layouts, I settled on a methodology for mapping reddit. I decided that I would project reddit onto a 2D plane where every subreddit is represented by a dot. A subreddit would connect to another subreddit if many users posted or commented in both of the subreddits, and the subreddit would be colored red if it connected to many subreddits or blue if it were connected to only a few. Finally, subreddits that connected to each other would be placed closer to each other on the 2D plane than subreddits that didn’t, which had the neat effect of creating “meta-communities” of subreddits.

reddit-map-full

When I initially shared this map, people started charting out the meta-communities that I had inadvertently created. Here’s one of my favorite maps:

reddit-world-map

Sure enough, if you zoom around on the interactive version of the map, you’ll see these meta-communities of reddit pop up before your very eyes: video games, sports, technology, movies, music, and of course a huge porn peninsula. To better highlight these meta-communities, I decided to color the subreddits by their cluster.

Community structure

Unbelievably, those very same meta-communities popped out in the clustered version of the map.

reddit-map-full-clustered

We see the active gaming cluster:

reddit-map-gaming-cluster

The hub of reddit techies:

reddit-map-tech-cluster

The isolated My Little Pony island:

reddit-map-mlp-cluster

And of course the massive porn cluster:

reddit-map-porn-cluster

Again, I made this interactive version available online to explore and make use of.

All of this points to a major discovery: By Liking, commenting on, sharing, hashtagging, RTing, and voting on content in these social networks, we are creating a hidden structure that shows what we’re really interested in, and what we’re on the social network to talk about. These maps bring this hidden structure to light.

If you’d like to read more about how these maps are constructed, you can read the corresponding research article here.

Where do we go from here?

Ultimately, I’d love to see this mapping methodology applied to social networks beyond reddit. Much like reddit and its subreddits, Twitter organizes itself around hashtags, Facebook organizes itself around Pages, and Pinterest organizes itself around Pins. The only thing stopping us from mapping these social networks out is the availability of their data.

Come, my fellow Lewis and Clarks. Let’s map out the untamed lands of social media.

Dr. Randy Olson is a postdoctoral researcher at the University of Pennsylvania. As a member of Prof. Jason H. Moore's research lab, he studies biologically-inspired AI and its applications to biomedical problems.

Posted in data visualization, reddit, research Tagged with: , , ,
  • Pingback: Mapa interactivo de Reddit()

  • Pingback: These Maps Show What Reddit Would Look Like If It Were A Real Geographic Place | Whitelabel News()

  • Pingback: These Maps Show What Reddit Would Look Like If It Were A Real Geographic Place()

  • Jonas

    What are your thoughts on the time aspects?

    If I understood it correctly, then what you have right now is a ‘snapshot’ of reddit at a specific date (or specific time-range). Lets call it version 2014. But how would the map look if you repeated this exercise in five years time – a version 2019? How should you interpret changes; i.e. say ‘sports’ moves from the south-western corner to the north-eastern. Would this be a coincident? Does this change imply that version 2014 becomes ‘useless’? Or some third thing? If it was clearer how to interpret changes in position, then it might also be clearer how to interpret the position itself.

    And if the intended goal is to ‘… assist users in organizing themselves into more specific interest groups’ – and if succesfull in doing so – there might well be significant changes to the map. It therefor seems relevant to know how to interpret these changes.

    As I see it, all positions of subreddits are relative to their links. Have you thought about ways to ‘fixate’ the position of subreddits? Perhaps positioning relative to the first subreddit (in your dataset). Or including some kind of (periodic) boundary (similar to Earth When we look at atlases we evaluate positions relative to the north/south pole or to equator. Of course the position of the south/north pole is completely arbitrary until you have a compass, that can point. When looking at the map, I feel I am missing some kind of compass or reference point, that can guide me.

    Almost all* the network representations I have seen do not seem to consider this aspect. Therefore I would love to hear what your thoughts are on this.

    *) The exception might be: https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919 If a lot of links over the atlantic suddenly disappeared, this could easily be interpreted as; decline in trans-atlantic friendships. Imagine a ‘pure’ network representation – without the underlying map – where nodes with few links got placed further apart. The interpretation if the links across the atlantic disappeared would of course be the same – but it would be some much harder to interpret from a map, where the only visible change was a divergence of two the communities.

    • Hi Jonas. You bring up some very fascinating points here! Tackling the temporal aspect of these maps is something I’m hoping to tackle next in this project. Of course social networks change (sometimes drastically) over the course of years, and it’s important to be able to produce a consistent (or at least semi-consistent) spatial layout between those years. I’ll take the liberty to speculate below.

      I’ve tried in the past to take a temporal look at the evolution of reddit with these maps (based solely on posting behavior, not commenting behavior): http://rhiever.github.io/redditviz/evolution-of-reddit/

      Obviously that version doesn’t maintain the same spatial layout between years, but it provides an interesting view of change and growth over time.

      That said, I wonder if it’s possible to identify the “core” subreddit of each cluster. For example, you’d expect /r/gaming and/or /r/games to be the “core” of the gaming cluster, /r/music to be the “core” of the music cluster, and so on. That core should remain relatively stable over the years, so you can assign a fixed location to the core then place the rest of the subreddits relative to those cores.

      • bonnabrand

        Well then, I will be looking forward to your next iteration on the project.

        I hadn’t seen you ‘Evolution of reddit’ visualisation before. You do see some of the temporal aspects. But it becomes more and more difficult to keep track as time passes. Especially given that both location and colour changes.

        Finding the “core” subreddit is one option.
        Another option could be to use some “discount” approach, similar to price indices in national accounts. In most national accounts you have some base-period. Everything is then reported in say year 2000-prices. Like the USA GDP in 2012 was $12.24 trillion measured in 2000-prices. Or like the the poverty line of $1 a day measured in 1990-prices. What if you did the same and fixed reddit in say 2010. All nodes/subreddits would be located relative to their position in 2010. If the node/subreddit was not there in 2010, it would be place relative to its parent/connected nodes, like you do today. Could something like this be possible?

        – Jonas

About this blog

This blog is my labor of love, and I've spent hundreds of hours working on the projects that you'll read about here. Generally, I write about data visualization and machine learning, and sometimes explore out-of-the-box projects at the intersection of the two. I hope you enjoy my projects as much as I have.

If you would like to use one of my graphs on your website or in a publication, please feel free to do so with appropriate attribution, but I would appreciate it if you email me first to let me know.

Archives

Subscribe

Enter your email address to subscribe to this blog and receive notifications of new posts by email.