**How to evaluate Probabilistic Forecasts**

H**ave you ever wondered how to objectively and scientifically evaluate probabilistic predictions produced by statistical, machine and deep learning models?**

In probabilistic prediction, the two critical evaluation criteria are ** validity** and

**.**

*efficiency*- π―ππ₯π’ππ’ππ² (terms such as "calibration" and "coverage" are also used) is *essentially all about ensuring that there is no bias in forecasts*. How does one measure bias in probabilistic forecasts that, unlike point predictions, produce PIs (prediction intervals)?

The concept is relatively simple and can be illustrated like this. If a forecasting model produced claimed 95% confidence, then (on average) prediction intervals should (by definition) cover ~ 95% of actual observations. That is ~ *95% of actual observations should be within the prediction intervals generated by the forecasting model.*

In probabilistic prediction, π―ππ₯π’ππ’ππ² is a ** must-to-have (necessary) criterion** before anything else comes into consideration. If forecasts produced by the model are not

**relying on such forecasts for decision-making is not helpful and could be risky and sometimes outright dangerous. In high-stakes applications such as medicine and self-driving cars,**

*probabilistically valid,**forecasts that lack validity (have a bias) can lead to catastrophic outcomes.*

- The second criterium is that of π²π³π³πΆπ°πΆπ²π»π°π (also terms such as "width" or "sharpness" are being used). Efficiency is *desirable but not a must-to-have* after the validity requirement has been satisfied. Efficiency relates to the width of prediction intervals - *having a more efficient predictive model means more narrow PIs (Prediction Intervals)*.

*These are the two critical metrics for evaluating any probabilistic predictor or probabilistic forecasting (time series) model and/or application.*

**Validity** (calibration/coverage) and **efficiency** (width/sharpness) are the natural and interpretable metrics for evaluating the predictive uncertainty of any probabilistic prediction (non-time series) or any probabilistic forecasting (time series ) model.

**How can one ensure and optimise validity and efficiency?**

In a nutshell, **validity** in final samples is **automatically guaranteed** by only one class of Un*certainty Quantification* methods β Conformal Prediction.

All other alternative Uncertainty Quantification methods do not have in-built validity guarantees. In the first independent comparative study of all four classes of uncertainty quantification methods, only Conformal Prediction satisfied the property of validity.

On the other side, efficiency depends on multiple factors, including the underlying prediction model, the quantity of data, how "hard" the dataset is, and, in the case of conformal prediction, the conformity measure.

An example below shows the validity (coverage /calibration) and efficiency (width/sharpness) for two probabilistic prediction models. Each model produces a prediction interval for each x_i in this scenario; such prediction interval attempts to cover the true y_i, represented by the red points with a specified confidence level (say 95%). Validity (coverage) is calculated *as the fraction of actual values contained in these regions*. These regions' width is reported in multiples of the standard deviation of the training set y_i values.

One can see that the first model has about 80% coverage. The second model has only about 60% coverage. Therefore, *the first model is better in terms of validity as it will result in much less bias*.

The first model has lower efficiency as the Average Width is about 2.2 standard deviations, whilst the second model's Average Width is only about 0.7 standard deviations.

Want to learn more about the problems and terminology in probabilistic forecasting β the key paper in probabilistic forecasting is from Gneiting and Katzfuss is '**Probabilistic forecasts, calibration and sharpness' (2007)**

#timeseries #uncertainty #metrics #forecasting #demandforecasting #probabilisticprediction #machinelearning

## Additional materials:

2. Awesome Conformal Prediction. The most comprehensive professionally curated list of Awesome Conformal Prediction tutorials, videos, books, papers and open-source libraries in Python and R. https://github.com/valeman/awesome-conformal-prediction