National Hurricane Center forecast accuracy in 2010
April 12, 2011
Today, the National Hurricane Center released its 2010 forecast verification report. It reviews the NHC’s performance in forecasting the track and intensity of tropical cyclones as well as their development. It also provides data on the accuracy of forecast models with regards to track and intensity. While it covers both the North Atlantic and Pacific basin, my summary will focus solely on the Atlantic portion of the report. Also, the report does not cover NOAA’s seasonal forecast, which is a product of the Climate Prediction Center. While not explicitly a forecast verification, their review of the 2010 hurricane season does cover how the season measured against their expectations.
In 2010, the National Hurricane Center established a new record for its accuracy of forecast five days (120 hours) out, with an average error of 187 nautical miles (breaking the record of 192 set in 2008). At the 36-120 hour lead times, forecast accuracy was better than the five year trailing average. A table of the average forecast error at the standard forecast lead times (in hours) follows.
|5 yr avg||32||53||75||97||144||196||252|
In my post on the shrinking cone of uncertainty, I wondered how much of the shrinkage was due to simply due 2005 rolling off the 5 yr average vice 2010 being a strong forecasting year. 2010’s positive improvement on the average is about equal to 2005’s negative affect, on the whole
Another way of looking at forecast accuracy is called skill. It is performance relative to a pre-determined benchmark. For hurricane track forecasts, the benchmark is the Climatology and Persistence (CLIPER) model. It is a very basic model that takes a storm’s current position and motion and generates a forecast based on the tracks of storms in the past in a similar position. In theory, the lower the forecast error was for CLIPER, the easier the storm was to forecast, because it conformed closer to past experience. Therefore, the lower CLIPER’s error was over the course of a season, the easier the season was to forecast. By looking at performance relative to CLIPER, we get an indication of whether forecast error in a particular year was lower due to the season being “easy” or a genuine improvement in forecasting was accomplished (for those who don’t follow what I’m trying to say here, I give a sports prediction analogy below) . Below is a table of average CLIPER error, in nautical miles, at the standard forecast lead times
|5 yr avg||47||97||155||212||305||388||468|
As CLIPER’s error in 2010 was above the 5 year average for all forecast periods, we can infer that the season was on aggregate harder to forecast than was the norm over the previous five seasons. Note, however that the 2008 forecast error at 120 hours was substantially higher, suggesting that even though 2010’s forecast error at that lead time was lower, the 2008 forecast was a better performance relatively speaking. This is a bit clearer if we look at it in terms of % improvement against CLIPER
The average 120 hour forecast by the NHC in 2010 was a 60% improvement on that of the CLIPER model. In 2008, it was a 64% improvement.
Once again, while the track forecast show continued signs of improvement, there was not much ground gained in the intensity forecast in absolute terms . Average intensity error ranged from 8 knots to 19 knots through the 12-120 hour forecast periods. While it was better than the five year average error at 72-120 hours, it was not record-setting at any forecast period. On the bright side, it was a substantial improvement on the baseline model (SHIFOR), with the % improvement ranging from 15 to 29 (While being an improvement against the baseline is a virtual given in track forecasting, that has not been true in the intensity forecasts; in 2006 most forecasts from the NHC actually performed worse than the primitive model).
The National Hurricane Center track forecast outperformed nearly all of the individual models. The GFS model (provided by a branch of the National Weather Service) was slightly better at the shorter term forecasts. The model provided by the European Center for Medium Range Weather Forecasts, popularly referred to as “the European” out-performed at the 36-96 hour forecast periods, and was the best individual model overall (while the model provided by the UK Meteorological Office was the best performing at 120h, the only one to outperform the NHC’s forecast five days out, its superlative performance at that period was not reflected across the board). Among models that are comprised of averages of other models, the Florida State Superensemble was the top performer. Its forecasts were better than the NHC’s at the 12-36 hour periods as well as 72 & 96 hours.
With regards to intensity, several models slightly outperformed the NHC forecast. The Florida State Superensemble did so for the 24-72 hour periods, but its performance dropped off at the later time steps. ICON, which is a simple average of a few models, was the best performer overall, which has tended to be the case, so long as one or more of its components does not perform exceptionally poorly, thereby dragging down the average.
This was the first season in which the NHC publicly released Tropical Weather Outlooks giving a percentage probability of a given disturbance becoming a tropical cyclone within the next 48 hours. During the 2007-2009 seasons, the NHC kept their forecasts in house, while publicly offering a categorical probability (low, medium, high) in the 2008 & 2009 seasons. The 2007-2009 forecasts worked well on average, with the chart of % storms formed against % probability forecast looked like it should (0% at 0% and rising evenly to 100% aat 100%. In 2010, however, things weren’t quite as smooth. While the chart looks right at the high (>70%) and low (<40%) ends, things went a bit awry in between. In general, the genesis rate of disturbances given 40-70% chance of developing, were the reverse of what they should have been as only 29% of disturbances with a 70% chance formed and 59% of 40% chance disturbances formed. Repeats of this performance would strongly suggest the NHC should revert back to categorical probabilities in the Tropical Weather Outlook.
“The diagrams also show the refinement distribution, which indicates how often the forecasts deviated from (a perceived) climatology. Sharp peaks at climatology indicate low forecaster confidence, while maxima at the extremes indicate high confidence; the refinement distributions shown here suggest an intermediate level of forecaster confidence.”
Forecast Skill explained in a sports analogy
Let us say you wanted to measure the accuracy of your picks in the NCAA tournament and be able to compare them on a year-to-year basis. Your total number of picks correct would be an absolute measure, as is the average track error. However, as some years have more upsets than others, some years are easier to pick than others, in theory. To account for this, you could compare your performance against a simple “model”, for instance, how many picks would have been correct had you simply picked the higher seed/favored team. That “pick the favorite” model would be akin to CLIPER and your performance against that baseline would be your skill. You would feel more proud of yourself getting 50 picks correct in a tournament with lots of upsets than in one in which the favorite one nearly every time. Similarly, the NHC, given identical track errors in two seasons, they would be more proud of the season where the storms were less conformant to pass history.