National Hurricane Center forecast accuracy in 2010

April 12, 2011

Today, the National Hurricane Center released its 2010 forecast verification report. It reviews the NHC’s performance in forecasting the track and intensity of tropical cyclones as well as their development.   It also provides data on the accuracy of forecast models with regards to track and intensity.  While it covers both the North Atlantic and Pacific basin, my summary will focus solely on the Atlantic portion of the report.  Also, the report does not cover NOAA’s seasonal forecast, which is a product of the Climate Prediction Center.  While not explicitly a forecast verification, their review of the 2010 hurricane season does cover how the season measured against their expectations.  

Track forecasting

In 2010, the National Hurricane Center established a new record for its accuracy of forecast five days (120 hours) out, with an average error of 187 nautical miles (breaking the record of 192 set in 2008). At the 36-120 hour lead times, forecast accuracy was better than the five year trailing average. A table of the average forecast error at the standard forecast lead times (in hours) follows.

  12 24 36 48 72 96 120
2010 34 54 72 89 129 166 187
5 yr avg 32 53 75 97 144 196 252
2009 30 45 62 73 119 198 292
2008 28 48 69 88 127 160 192
2007 33 51 71 92 146 167 258
2006 30 51 72 97 149 206 265
2005 35 60 84 106 156 220 286

In my post on the shrinking cone of uncertainty,  I wondered how much of the shrinkage was due to simply due 2005 rolling off the 5 yr average vice 2010 being a strong forecasting year.  2010’s positive improvement on the average is about equal to 2005’s negative affect, on the whole

Another way of looking at forecast accuracy is called skill. It is performance relative to a pre-determined benchmark.  For hurricane track forecasts, the benchmark is the Climatology and Persistence (CLIPER) model. It is a very basic model that takes a storm’s current position and motion and generates a forecast based on the tracks of storms in the past in a similar position.  In theory, the lower the forecast error was for CLIPER, the easier the storm was to forecast, because it conformed closer to past experience.  Therefore, the lower CLIPER’s error was over the course of a season, the easier the season was to forecast.  By looking at performance relative to CLIPER, we get an indication of whether forecast error in a particular year was lower due to the season being “easy” or a genuine improvement in forecasting was accomplished (for those who don’t follow what I’m trying to say here, I give a sports prediction analogy below) . Below is a table of average CLIPER error, in nautical miles, at the standard forecast lead times

  12 24 36 58 72 96 120
2010 53 108 166 230 332 400 470
5 yr avg 47 97 155 212 305 388 468
2009 51 103 167 225 346 463 570
2008 45 99 166 235 349 448 536
2007 45 85 122 160 237 323 512
2006 43 90 145 203 299 332 334
2005 49 102 160 210 285 367 453

As CLIPER’s error in 2010 was above the 5 year average for all forecast periods, we can infer that the season was on aggregate harder to forecast than was the norm over the previous five seasons.  Note, however that the 2008 forecast error at 120 hours was substantially higher, suggesting that even though 2010’s forecast error at that lead time was lower, the 2008 forecast was a better performance relatively speaking.  This is a bit clearer if we look at it in terms of % improvement against CLIPER

  12 24 36 58 72 96 120
2010 36 50 57 61 61 59 60
2009 41 57 63 68 66 57 49
2008 38 51 59 63 64 64 64
2007 28 40 42 43 38 48 50
2006 32 44 50 52 50 38 21
2005 29 41 47 49 45 40 37

The average 120 hour forecast by the NHC in 2010 was a 60% improvement on that of the CLIPER model. In 2008, it was a 64% improvement.

Intensity forecasts

Once again, while the track forecast show continued signs of improvement, there was not much ground gained in the intensity forecast in absolute terms .  Average intensity error ranged from 8 knots to 19 knots through the 12-120 hour forecast periods. While it was better than the five year average error at 72-120 hours, it was not record-setting at any forecast period. On the bright side, it was a substantial improvement on the baseline model (SHIFOR), with the % improvement ranging from 15 to 29 (While being an improvement against the baseline is a virtual given in track forecasting,  that has not been true in the intensity forecasts; in 2006 most forecasts from the NHC actually performed worse than the primitive model).

Forecast models

The National Hurricane Center track forecast outperformed nearly all of the individual models. The GFS model (provided by a branch of the National Weather Service) was slightly better at the shorter term forecasts. The model provided by the European Center for Medium Range Weather Forecasts, popularly referred to as “the European” out-performed at the 36-96 hour forecast periods, and was the best individual model overall (while the model provided by the UK Meteorological Office was the best performing at 120h, the only one to outperform the NHC’s forecast five days out, its superlative performance at that period was not reflected across the board). Among models that are comprised of averages of other models, the Florida State Superensemble was the top performer. Its forecasts were better than the NHC’s at the 12-36 hour periods as well as  72 & 96 hours.

With regards to intensity, several models slightly outperformed the NHC forecast. The Florida State Superensemble did so for the 24-72 hour periods, but its performance dropped off at the later time steps. ICON, which is a simple average of a few models, was the best performer overall, which has tended to be the case, so long as one or more of its components does not perform exceptionally poorly, thereby dragging down the average.

Tropical Cyclogenesis

This was the first season in which the NHC publicly released Tropical Weather Outlooks giving a percentage probability of a given disturbance becoming a tropical cyclone within the next 48 hours. During the 2007-2009 seasons, the NHC kept their forecasts in house, while publicly offering a categorical probability (low, medium, high) in the 2008 & 2009 seasons.  The 2007-2009 forecasts worked well on average, with the chart of % storms formed against % probability forecast looked like it should (0% at 0% and rising evenly to 100% aat 100%. In 2010, however, things weren’t quite as smooth. While the chart  looks right at the high (>70%) and low (<40%) ends, things went a bit awry in between. In general, the genesis rate of disturbances given 40-70% chance of developing, were the reverse of what they should have been as only 29% of disturbances with a 70% chance formed and 59% of 40% chance disturbances formed.  Repeats of this performance would strongly suggest the NHC should  revert back to categorical probabilities in the Tropical Weather Outlook.

The diagrams also show the refinement distribution, which indicates how often the forecasts  deviated from (a perceived) climatology. Sharp peaks at climatology indicate low forecaster  confidence, while maxima at the extremes indicate high confidence; the refinement distributions shown here suggest an intermediate level of forecaster confidence.”

Forecast Skill explained in a sports analogy

Let us say you wanted to measure the accuracy of your picks in the NCAA tournament and be able to compare them on a year-to-year basis.  Your total number of picks correct would be an absolute measure, as is the average track error. However, as some years have more upsets than others, some years are easier to pick than others, in theory. To account for this, you could compare your performance against a simple “model”, for instance, how many picks would have been correct had you simply picked the higher seed/favored team.  That “pick the favorite” model would be akin to CLIPER and your performance against that baseline would be your skill.  You would feel more proud of yourself getting 50 picks correct in a tournament with lots of upsets than in one in which the favorite one nearly every time.  Similarly, the NHC, given identical track errors in two seasons, they would be more proud of the season where the storms were less conformant to pass history.


One Response to “National Hurricane Center forecast accuracy in 2010”

  1. Jim Young Says:

    Hey I only have an IQ of 110 can you make this any more convoluted? If I did not know better I would say the way you have set this up is designed to cover up the deviation to make NHC look better. Did not your man in charge of predictions say it was not worth his time.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: