How to predict full probability distribution using machine learning Conformal Predictive Distributions

Valeriy Manokhin, PhD, MBA, CQF
7 min readOct 9, 2022

In my previous article “Conformal Prediction forecasting with MAPIE” we looked into how one can create probabilistic forecasts using Ensemble batch prediction intervals (EnbPI) implemented by one of the open-source libraries for conformal prediction MAPIE.

Many of the Conformal Prediction models output set predictions, in regression case set predictions are basically Prediction Intervals that specify, with a given confidence, a lower and an upper bound on predicted values of target variable y.

A recent paper by DeWolf et. al “Valid prediction intervals for regression problems” was the first independent study that looked at desired qualities of a probabilistic regressor and how all four classes of approaches achieved the primary objectives of a good probabilistic predictor.

What the authors found was that Conformal Prediction was clearly the best approach for Uncertainty Quantification in regression tasks as, unlike other classes of approaches conformal prediction guarantees validity (lack of bias) of probabilistic predictions for any problem, any data distribution and any dataset size and any underlying regression model whether statistical or machine learning or deep learning.

One can agree that the validity of predictions is the most desired quality of a probabilistic predictor, especially in critical applications such as health, finance and self-driving cars. This is because using biased predictions can result in catastrophic outcomes such as misdiagnosing a critical disease or not stopping a self-driving car when a pedestrian is on the road or entering multi-million trades on the basis of biased predictions.

Even outside of critical applications using biased prediction results in suboptimal decisions that for any large company can result in very significant losses. Take a manufacturing or retail company for example — a biased probabilistic prediction of demand will result in incorrect decisions in procurement, demand planning and operations resulting in customers either not served and leaving elsewhere or excess and slow-moving inventories resulting in write-offs and costly damage to the bottom line P&L.

DeWolf et. al. “Valid prediction intervals for regression problems.”
De Wolf et. al. “Valid prediction intervals for regression problems”

Compare the…

Valeriy Manokhin, PhD, MBA, CQF

Principal Data Scientist, PhD in Machine Learning, creator of Awesome Conformal Prediction 👍Tip: hold down the Clap icon for up x50

Lists

See more recommendations