Visualizing Indego bike share usage patterns in Philadelphia

One of the many things that I love about my new home town of Philadelphia is that the government openly shares curated data sets covering most of the governmental functions. Since I recently joined Philadelphia’s Indego bike share program, I decided to start working with their bike usage data set to see what useful tools I could build.

indego-bike-share

If you’ve ever used a bike share before, you know that one of the biggest fears is coming up to an empty bike share station when you need a bike. (Or similarly, coming up to a full station when you need to drop a bike off.) To help abate those fears, I’ve started monitoring the Indego bike usage API to see if I could model and predict when the bike share stations are most likely going to be empty or full. This tool is useful in two ways:

  1. It helps the Indego users by keeping them aware of the bad times to use certain bike share stations.
  2. It helps the Indego bike share service by predicting when a particular bike share station will require service (e.g., driving a truck out to pick up or drop off a large batch of bikes).

Undoubtedly Indego’s data science team is already performing some flavor of this model-and-predict scheme, but I thought it’d be fun to publicly tackle this problem and see how far I could get. For this post, I’ll focus on visualizing patterns in the data, and will take a stab at prediction in a future post.

Daily usage patterns

One of the first steps toward building a model that can make any sort of useful prediction is to look at the existing patterns in the data. How are Philadelphians making use of the Indego bike share program? What does a typical day look like for the Indego bike share program?

To get at those questions, I’ve been gathering the current status of each bike share station every 5 minutes since July 1, 2015. To provide some visuals of the data, I fit regressions to the usage patterns of each individual bike share station. The measure I’m using here to represent “station usage” is the percentage of a station’s docks that are filled with bikes, where 100% represents a station full of bikes and 0% represents an empty station.

I’ve plotted each regression below, separated into three distinct categories:

  • Outbound commuting stations: Stations where people take a bike from home to ride to work or school.
  • Inbound commuting stations: Stations where people take a bike from work or school to ride back home.
  • Underused stations: Stations that see minimal use compared to the other stations.

daily-usage-patterns-outbound

daily-usage-patterns-inbound

As the two above plots show, many Philadelphians have adopted the Indego bike share program into their daily commute to work. Around 8 AM ET, we start to see bikes leaving several stations around town, which is followed shortly thereafter by an influx of bikes into stations at other parts of the city. Similarly around 5 PM ET, we see the reverse trend, with bikes heading back to the home stations.

daily-usage-patterns-underused

Unfortunately, several bike share stations seem to go mostly ignored. As shown in the above plot, these stations see little to no change in their usage throughout the day — with the same bikes sitting in their docks day by day — which perhaps means the stations need to be relocated.

Mapping the daily usage patterns

To provide a better spatial context to the above patterns, I mapped each bike share station onto an interactive map of Philadelphia and color-coded the stations by their usage pattern.

As expected, the stations that see a large influx of bikes during work hours are in the primary business districts and education centers of Philadelphia. The bike stations along Market Street, around the University of Pennsylvania and Drexel University, and even up at Temple University are all places that Indego bike sharers ride to work.

In contrast, most of the bike stations that see a decline in bikes in the morning are in residential areas further out in the city. This observation only piles on evidence that the Indego bike share program is being used for daily commutes to work and school moreso than joyrides by tourists.

By this view, the Indego bike share program has been a resounding success so far. Some of the existing underused stations may require adjustment, but it’s quite clear that Indego is here to stay.

Weekly usage patterns

Finally, I thought it would be interesting to show the weekly usage patterns of the stations. I’ve selected a handful of stations below and plotted their usage patterns, where darker red means “close to full of bikes” and darker blue means “close to empty.”

day-by-day-usage-patterns-11th & Reed

day-by-day-usage-patterns-The Children's Hospital of Philadelphia (CHOP)

The stations at 11th & Reed and CHOP display the stereotypical commuting patterns that I discussed above. Interestingly, the CHOP station is one of a handful of stations that seems to be used almost exclusively for commuting, whereas most stations see some form of notable activity on the weekends.

day-by-day-usage-patterns-2nd & Germantown

Above, I’ve visualized the weekly usage patterns of the station at 2nd & Germantown to highlight the irregular usage patterns of some of the Indego bike share stations. Even though the 2nd & Germantown station is used as an outbound commuter station on the weekdays, it’s also quite popular as a station to reach the bars, restaurants, and activities in Northern Liberties on Friday night.

At this point, I clearly need more data to properly model and predict the usage patterns since it’s fairly clear that some bike stations are used differently at different times of the week. In the meantime…

What else would help the Indego bike share program?

I was previously thinking that we needed an Indego dock status tracker, but the most common devices are already covered: web | iOS | Android

Do you have any ideas for what tools would be useful to supplement the Indego bike share program? Feel free to add your suggestions here in the comments.

Dr. Randy Olson is a Senior Data Scientist at the University of Pennsylvania, where he develops state-of-the-art machine learning algorithms with a focus on biomedical applications.

Posted in analysis, data visualization Tagged with: , , , ,
  • kclo3

    What distinction is there between underused stations and those simply with high sustained turnover, especially if they’re hovering at 25% or less filled? If there really are underused stations in Center City, the main culprit for that is overwhelmingly insufficient bike lane infrastructure, especially around by the Parkway (I’d say it’s almost irresponsible to put stations there to subject first-time city cyclists to 50 MPH Parkway traffic). Then there are the cases where gray stations border coverage vacuums (Callowhill) or on the edge of current Phase 1 rollout. Adding bigger and more concentrated stations at commuter nodes like Temple will help peripheral stations greatly.

    • Admittedly, the definition of “underused” here was fairly arbitrary. I was looking at the standard deviation of the 24 hour usage patterns and chose a low cutoff that selected mostly “flat” trends. I think if any of those stations were to be considered for relocation, we should delve deeper into those station’s day-by-day trends to see why the usage pattern appears flat.

      • k8iedid

        Art museum is one of the most highly-trafficked stations, so this must be one that fell into the high sustained turnover bucket.

  • Scott Jones

    Is there any way for you to distinguish between stations where bikes come and go frequently within your 5 minute wndow, and ones that aren’t really being used?
    (I’m not sure what data is actually available to you from the Philadelphia system)
    I use the Velo bike system in Antwerpen, Belgium, and in the center, near subway stops and the train station, there is a lot of coming and going, without that much change in the number of bikes.
    Very interesting study!

    • There’s a couple ways to make that distinction.

      The first way is what I did here — to basically look at the standard deviation of the signal over time. If the station is seeing long periods with the same number of bikes, then it’s likely that it isn’t being used. (Unless it’s empty for a long stretch, but this is rare from what I can tell.)

      The second way is to look at the changes in the number of bikes every 5 minutes. The sum of the absolute change in bike numbers every 5 minutes over, say, a week period could also given an indication of a frequently-used station.

      Ideally, we’d have a data set containing a row for every time a person checks a bike in and out, but Indego doesn’t release that data yet for privacy reasons. Hopefully they can start releasing that data soon now that the program has really taken off.

      • TongueWagger

        Ask them for data and promise not to disclose identifiers. It is such a good analysis that they should want to see what you come up with. It is a forward thinking company and should be interested in your findings as well.

      • nayls142

        My closest station is 15th and Spruce. I’ve been anecdotally trying to make sense of the usage. It seems like bikes don’t stick around. When its empty, I head to 17th and PIne. When I return a bike to 15th and Spruce, it seems to get snatched up immediately. I doubt the same bikes are sticking around for days. This neighborhood is in the transition between residential, commercial, and night life, so I really feel the trend lines may be canceling each other.

  • k8iedid

    This is incredible, thank you so much for all this effort! As a regular rider between 2 stations, I def see a pattern of availability.

  • M Coen

    I suggest you transpose the weekly usage patterns. This way it resembles the common calendar we have on our phones and laptops and could be easier to extract visual data at first glance. Making the jump from right to left on Friday night took a couple of seconds. I assume having it transposed would follow normal visual scanning.

  • Hugh Lynch

    I wonder how hard it would be to optimize a bike redistribution itinerary. In NYC they actively re/destock bikes at the train stations during rush hour.

  • PeterVermont

    Simple Web app for finding a bike or a dock. Open source written in javascript and demoed with explanation of code at Wharton Web Conference

    http://bike-me-now.boutell.com/

  • firoozye

    Randy,
    Cool stuff.

    Just looking at CHOP, it has some seasonals / weekly patterns and appears to get more full on Mondays than other days. But this may be because it doesn’t get emptied out beforehand.

    In London the bike usage is partly dictated by our TFL/BarclaysBike (now Santander), who come to redistribute bikes from overly full stations to overly empty ones. If you knew this schedule (if it was predictable or they posted it online) might change your conclusions. This is partly the control variable you want to help adjust, I guess. That and the more extreme case of moving or removing underused stations altogether.

    It looks like Indego comes and empties it Wed night and Thur night without fail. I don’t know the schedule–maybe they even empty it other nights, but Monday early AM, it’s not as empty as other days, and seems unable to take the large influx of commuters on Monday.

    2nd and Germantown, it gets emptied 2pm on Saturdays. It looks like revellers ride bikes extremely late at night (peaking at 2am)! Making sure the lighting is good there would help accidents. Some of the revellers make their way home by 10am? Most, sensibly seem to take taxis….

    One other chart which might be useful is flow-related (rather than stock). Perhaps you even have this directly from your dataset. The net flow into the station from 6am to 10am every morning. And the net outflow from 2pm to 6pm. Do the number of commuters change throughout the week? You mention calculating Stdev, but the fact is you know a lot more about usage patterns already – a lot more.

    The plot you show almost makes it seem like bike-riding commuters at CHOP start out quite enthusiastic early in the week, then lose steam as they go on! Come Friday, they’d rather take a taxi! I’d bet this first impression is deceptive and the looking at more flow rather than stock variables would sort it out.

  • Robert Cheetham

    Great work. You cited the Indigo bike share station API but this is all based on usage. I can’t find any docs for the API. Is there an endpoint for querying usage for stations based on dates, times, etc.

    • Unfortunately they don’t provide that information yet, but you can find my cache here: http://www.randalolson.com/data/indego-usage.tsv.gz

      It updates every hour.

      • Robert Cheetham

        Thanks. If you are collecting this hourly and it’s the best source for the data, would you be willing to have it posted as a source on OpenDataPhilly?

        Robert

        • Would that just be a link to the csv file I linked above? If so, sounds fine to me. It’d be great if I had access to edit the entry in case I change how the data is stored. (Currently, it’s stored in an inefficient tsv format with a ton of repeated information, but I would like to get around to reformatting it into JSON someday.)

    • timwis

      The API is listed on OpenDataPhilly at https://www.opendataphilly.org/dataset/bike-share-stations/resource/2d484d8a-fdce-4b1d-8e00-3bb787dfb477 – it’s just a GeoJSON endpoint, doesn’t have any query abilities that I know of