Accuracy of three major weather forecasting services

For the past month, I’ve been slowly working my way through Nate Silver’s book, The Signal and the Noise. It’s really a great read, but if you’re a regular reader on this blog, I’d imagine you’ve already read it. This book is loaded with all kinds of great examples of where predictive analytics succeeds and fails, and I decided to highlight his weather forecasting example because of how surprising it was to me.

For those who aren’t in the know: Most of the weather forecasts out there for the U.S. are originally based on data from the U.S. National Weather Service, a government-run agency tasked with measuring and predicting everything related to weather across all of North America. Commercial companies like The Weather Channel then build off of those data and forecasts and try to produce a “better” forecast — a fairly lucky position to be in, if you consider that the NWS does a good portion of the heavy lifting for them.

We all rely on these weather forecasts to plan our day-to-day activities. For example, before planning a summer grill out over the weekend, we’ll check our favorite weather web site to see whether it’s going to rain. Of course, we’re always left to wonder: Just how accurate are these forecasts? Plotted below is the accuracy of three major weather forecasting services. Note that a perfect forecast means that, e.g., the service forecasted a 20% chance of rain for 40 days of the year, and exactly 8 (20%) of those days actually had rain.

weather-forecast-accuracy-flipped

There’s some pretty startling trends here. For one, The Weather Service is pretty accurate for the most part, and that’s because they consistently try to provide the most accurate forecasts possible. They pride themselves on the fact that if you go to Weather.gov and it says there’s a 60% chance of rain, there really is a 60% chance of rain that day.

With the advantage of having The Weather Service’s forecasts and data as a starting point, it’s perhaps unsurprising that The Weather Channel manages to be slightly more accurate in their forecasts. The only major inaccuracy they have, which is surprisingly consistent, is in the lower and higher probabilities of raining: Weather.com often forecasts that there’s a higher probability of raining than there really is.

This phenomenon is commonly known as a wet bias, where weather forecasters will err toward predicting more rain than there really is. After all, we all take notice when forecasters say there won’t be rain and it ends up raining (= ruined grill out!); but when they predict rain and it ends up not raining, we’ll shrug it off and count ourselves lucky.

The worst part of this graph is the performance of local TV meteorologists. These guys consistently over-predict rain so much that it’s difficult to place much confidence in their forecasts at all. As Silver notes:

TV weathermen they aren’t bothering to make accurate forecasts because they figure the public won’t believe them anyway. But the public shouldn’t believe them, because the forecasts aren’t accurate.

Even worse, some meteorologists have admitted that they purposely fudge their rain forecasts to improve ratings. What’s a better way to keep you tuning in every day than to make you think it’s raining all the time, and they’re the only ones saving you from soaking your favorite outfit?

For me, the big lesson learned from this chapter in Silver’s book is that I’ll be tuning in to Weather.gov for my weather forecasts from now on. Most notably because, as Silver puts it:

The further you get from the government’s original data, and the more consumer facing the forecasts, the worse this bias becomes. Forecasts “add value” by subtracting accuracy.

Randy is a PhD candidate in Michigan State University's Computer Science program. As a member of Dr. Chris Adami's research lab, he studies biologically-inspired artificial intelligence and evolutionary processes.

Posted in data visualization Tagged with: , , , , ,
  • marc

    What about replacing the “probability” term on the verticla axis by something that sound more statistical ? (I mean, descriptive stats). “Observed probability” sounds strange to me, even if I am highly Bayesian-compatible.

    • http://www.randalolson.com Randy Olson

      Better now? :-)

      • marc

        well, it was more a question than an order :)

        • http://www.randalolson.com Randy Olson

          It was a good question/suggestion. ;-)

  • Rob Dale

    Very few TV meteorologists give forecast rain chances for a point like NWS forecasts do. How did you compare them?

    Your second paragraph is wholly inaccurate. I think you confused NWS forecasts made by meteorologists, and NWS computer model data. Nobody takes the NWS forecast and “tweaks” it better. We (private sector and NWS) use the same basic set of observations and model data to create our own forecasts.

    • http://www.randalolson.com Randy Olson

      Very few TV meteorologists give forecast rain chances for a point like NWS forecasts do. How did you compare them?

      The source for that comparison is here: http://freakonomics.com/2008/04/21/how-valid-are-tv-weather-forecasts/

      • mark

        Since the NWS probability of precipitation is based on .01 or more, and this study only verified for .10 inch or more, the statistics mean nothing. The NWS is forecasting one thing, and this person was verifying something totally different. There’s many times when we know the rain will be light…less than .10 inches…but know it will be over .01, so the POP will be high as it should be.

  • Jayson Prentice

    To say that private weather companies use the National Weather Services forecast and “build off of those forecasts” isn’t entirely correct. Nearly every forecast, both NWS and private, is translated from a number of numerical weather prediction models (computer models) in which a forecaster, or computer, then takes these forecasts to create the final forecast product.

    Each NWS forecast is derived from the models and other sources, but is ‘hand-crafted’ by a meteorologist to produce the forecast you see on weather.gov. Private companies will use the same model data (often provided by government sources) to produce their own forecast, either through a computer or a meteorologist, many times not even referencing or using the actual NWS forecast itself.

    • http://www.randalolson.com Randy Olson

      Thanks for pointing this out, Jayson. I’ve changed my wording slightly in the beginning paragraph to better reflect the truth of the matter.

  • Robert

    I’m currently reading Silver’s book too! The most interesting so far is being able to predict the likelihood of earthquakes. Few theories have been introduced but none have been successful. Since those times, any heed for an algorithm that can predict likelihood for them goes largely ignored. I believe the weather.gov and earthquake analytics talk in his book are in the same chapter. Won’t spoil for anyone, but it’s a good read!

    • http://www.randalolson.com Randy Olson

      The “for years you’ve been telling us that rain is green!” story in that chapter had me cracking up. Silver’s an entertaining writer.

  • Pingback: Local weather channels consistently over-predict rain on purpose | Blog()

  • Pingback: We can accurately predict weather only a few days into the future | Randal S. Olson()

  • Pingback: We can only predict weather a few days into the future | Randal S. Olson()

  • bcl

    If you’re interested in the science (and some of the problems we have in the US) read UW Professor Cliff Mass’ blog – http://cliffmass.blogspot.com/

  • Sara

    This is really interesting. However, I wonder if an ROC curve (or similar plot) would be more informative. The NWS may have more “accurate” long-run probabilities, but this should translate into a higher rate of Type I error than that of local forecasters. As you point out, the ratings penalty for a Type I error may be higher than that for a Type II error, and so erring “on the side of caution” may actually provide a better service to the public.

    • http://www.randalolson.com Randy Olson

      That is indeed TWC and the like’s gig: “Commercial forecasts ‘add value’ by subtracting accuracy.” In this case, subtracting accuracy may not always be a bad thing. Otherwise TWC and co. likely wouldn’t still be around. :-)

      • Sara

        Well, actually, I guess my point was that the information here isn’t informative about binary precipitation forecast accuracy, which is what’s more interesting to most people. This instead displays weighting of probabilities vs long-run true probabilities, which may just reflect overweighting of smaller probabilities by the target audience. What I think would be more informative re:accuracy is to look at this as a binary classification problem. For that, though, you’d need a different type of plot and probably a different type of data to actually say anything conclusive about “accuracy.”

        • http://www.randalolson.com Randy Olson

          That’s true, and I’m currently trying to get my hands on that kind of data. We’ll see how that goes. :-)

  • Pingback: Didn’t rain like the weather forecast, find out why? | Samacara()

  • Ryan Wichman

    Isn’t it a bit irresponsible to include all broadcast meteorologists into a study of one city? While that study appears fair in KC, it’s a gross generalization to base a nationwide profession with significantly different weather on one location.

    • http://www.randalolson.com Randy Olson

      It’s hard to say if the generalization holds or not. Do you track the accuracy of your forecasts? I’d be curious to see this study expanded to local TV meteorologists across the entire U.S.

  • Stephen

    This article is a science fail. As a local TV meteorologist, I wonder whose forecasts you have actually looked at. I can also tell you this whole “fudge the forecast to improve ratings” is absolutely not the case. We are not in the business for that. This is a good example of shoddy science with what appear to be made up “facts” to support your claims. You clearly looked at a very small and unrepresentative sample of TV stations and NWS offices for your “study.” No forecaster, television or otherwise, is or will ever be 100% accurate. Thankfully some of the others who have commented on this post seem to notice the bias and inaccuracy with which you have reported. I would never say the National Weather Service does a bad job, but frequently they aren’t even in the area of the local TV station, and the tv meteorologists in those areas tend to perform quite well. To say NWS “consistently” outperforms tv meteorologists is, as much of your article seems to be, a gross and incorrect generalization.

    • http://www.randalolson.com Randy Olson

      Considering your profession, I understand the agitated tone in your comment here. But to call this article a science fail is a bit much. For one, the accuracy of the data on TWC and NWS is very difficult to dispute. After all, the data is from ForecastWatch.com, who has made a career out of checking the accuracy of weather forecasts. The data for the local TV meteorologists is from the Freakonomics blog, which was indeed limited to several local channels in Kansas. Are all local TV meteorologists as bad as the Kansas ones? That’s hard to say. Maybe local TV meteorologists should start recording and reporting on their own accuracy if they’re worried about being trusted.

      • mark

        NWS forecasts the probability of .01 or greater of precipitation for a 12-hour forecast period from 00Z-12Z and from 12Z-00Z (6-6 standard time in the central time zone). The Freakonomics blog only counted rain of .10 or greater, so that cannot be used to determine the accuracy of NWS forecasts. NWS forecast high temperatures are valid from 7 am-7 pm local standard time, and forecast low temperatures are valid from 7 pm- 8 am local standard time. Unless the verification data followed those exact time definitions, then the temperature data would be invalid as well. There are many times when the “daily” low falls outside those time limits, and a lesser number of times when the “daily” high does..

  • Pingback: What the TV Weather Forecast Says About Human Nature | Matthew's Musings()

  • Pingback: Somewhere else, part 145 | Freakonometrics()

About this blog

The data visualizations on this blog are the result of my “data tinkering” hobby, where I tackle a new data analysis problem every week. If I find something interesting, I report my findings here to share with the world.

If you like the work in this blog, I'm currently available for hire as a freelancer. Send me an email if you'd like to discuss freelance work.

If you would like to use one of my graphs on your website or in a publication, please email me.

Archives

Enter your email address to subscribe to this blog and receive notifications of new posts by email.