**How to evaluate Probabilistic Forecasts**

--

H**ave you ever wondered how to objectively and scientifically evaluate probabilistic predictions produced by statistical, machine and deep learning models?**

In probabilistic prediction, the two critical evaluation criteria are ** validity** and

**.**

*efficiency*- 𝐯𝐚𝐥𝐢𝐝𝐢𝐭𝐲 (terms such as "calibration" and "coverage" are also used) is *essentially all about ensuring that there is no bias in forecasts*. How does one measure bias in probabilistic forecasts that, unlike point predictions, produce PIs (prediction intervals)?

The concept is relatively simple and can be illustrated like this. If a forecasting model produced claimed 95% confidence, then (on average) prediction intervals should (by definition) cover ~ 95% of actual observations. That is ~ *95% of actual observations should be within the prediction intervals generated by the forecasting model.*

In probabilistic prediction, 𝐯𝐚𝐥𝐢𝐝𝐢𝐭𝐲 is a ** must-to-have (necessary) criterion** before anything else comes into consideration. If forecasts produced by the model are not

**relying on such forecasts for decision-making is not helpful and could be risky and sometimes outright dangerous. In high-stakes applications such as medicine and self-driving cars,**

*probabilistically valid,**forecasts that lack validity (have a bias) can lead to catastrophic outcomes.*

- The second criterium is that of 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆 (also terms such as "width" or "sharpness" are being used). Efficiency is *desirable but not a must-to-have* after the validity requirement has been satisfied. Efficiency relates to the width of prediction intervals - *having a more efficient predictive model means more narrow PIs (Prediction Intervals)*.

*These are the two critical metrics for evaluating any probabilistic predictor or probabilistic forecasting (time series) model and/or application.*

**Validity** (calibration/coverage) and **efficiency** (width/sharpness) are the natural and interpretable metrics for evaluating the predictive uncertainty of any probabilistic prediction (non-time series) or any probabilistic forecasting (time series ) model.

**How can one ensure and optimise validity and efficiency?**

In a nutshell, **validity** in final samples is **automatically guaranteed** by only one class of Un*certainty Quantification* methods — Conformal Prediction.