How to predict full probability distribution using machine learning Conformal Predictive Distributions

Valeriy Manokhin, PhD, MBA, CQF
7 min readOct 9, 2022

In my previous article “Conformal Prediction forecasting with MAPIE” we looked into how one can create probabilistic forecasts using Ensemble batch prediction intervals (EnbPI) implemented by one of the open-source libraries for conformal prediction MAPIE.

Many of the Conformal Prediction models output set predictions, in regression case set predictions are basically Prediction Intervals that specify, with a given confidence, a lower and an upper bound on predicted values of target variable y.

A recent paper by DeWolf et. al “Valid prediction intervals for regression problems” was the first independent study that looked at desired qualities of a probabilistic regressor and how all four classes of approaches achieved the primary objectives of a good probabilistic predictor.

What the authors found was that Conformal Prediction was clearly the best approach for Uncertainty Quantification in regression tasks as, unlike other classes of approaches conformal prediction guarantees validity (lack of bias) of probabilistic predictions for any problem, any data distribution and any dataset size and any underlying regression model whether statistical or machine learning or deep learning.

One can agree that the validity of predictions is the most desired quality of a probabilistic predictor, especially in critical applications such as health, finance and self-driving cars. This is because using biased predictions can result in catastrophic outcomes such as misdiagnosing a critical disease or not stopping a self-driving car when a pedestrian is on the road or entering multi-million trades on the basis of biased predictions.

Even outside of critical applications using biased prediction results in suboptimal decisions that for any large company can result in very significant losses. Take a manufacturing or retail company for example — a biased probabilistic prediction of demand will result in incorrect decisions in procurement, demand planning and operations resulting in customers either not served and leaving elsewhere or excess and slow-moving inventories resulting in write-offs and costly damage to the bottom line P&L.

DeWolf et. al. “Valid prediction intervals for regression problems.”
De Wolf et. al. “Valid prediction intervals for regression problems”

Compare the validity of Conformal Prediction guaranteed by math regardless of the data distribution, the underlying regression model (whether statistical, machine or deep learning) with no mathematical guarantees whatsoever offered by the Bayesian approach .

Unless the prior is known (which never holds in practice) the Bayesian approach will output incorrect predictions resulting in posterior located in the wrong place with no warning to the user whatsover, the user then will use posterior distribution that is 1) located in the wrong place 2) provides incorrect prediciton intervals and use is for decision making resulting in incorrect and costly decisions.

Unlike the Bayesian approach, Conformal Prediction outputs valid (unbiased) predictions and does not have issues with scalability for the Inductive Conformal Prediction (the most popular approach in applications)

One can understand why interest in Conformal Prediction is going exponential. More and more academic institutions and companies are discovering the best Uncertainty Quantification framework for the XXIst century.

You don’t have to take my word for it— just check out some of the biggest tech influencers working for some of the best unicorn mega startups like Hugging Face talking about Conformal Prediction -> Getting prediction intervals with conformal prediction.

Back to the technical topic — a Bayesian might claim that “Wait for a minute, but Conformal Prediction can only product Prediction Intervals, not the full CDF like the Bayesian approach can! Right?”

Well, actually … wrong. It turns out that Conformal Prediction can produce the whole CDF (Cumulative Distribution Function) tailored to each test point! And it does so with all the benefits conferred by using Conformal Prediction — namely theoretical guarantees of validity for any underlying regression model, for any data distribution and for any dataset size.

And all these benefits are based even on the more mild assumption that is required to do machine learning, Conformal Prediction only requires data exchangeability which is a milder assumption than IID, so if one can do machine learning one can automatically always use Conformal Prediction! Bingo!

So how does this work exactly?

Conformal predictive systems for regression output Conformal Predictive Distributions (Cumulative Distribution Functions) for each test object — the result for this test object is the probabilistic prediction like this.

And this is the equivalent of the Holy Grail of prediction — full and complete quantification of uncertainty tailored to each test object for optimal decision-making.

After producing complete distribution for each test point one can do a lot of things with complete CDF.

For point prediction one simply reads off the value of y corresponding to Q(y)=1/2 in the example below give the value of y around 20.

Need quantiles — no problem as well, one simply reads values of y corresponding to a particular quantile on the vertical axis — say 0.95.

Problem solved. If you need more technical details about how this is done under the hood (including rigorous math proofs), I have linked several key papers.

One can also find tutorials about Conformal Predictive Distributions on Awesome Conformal Prediction — the most comprehensive professionally curated resource for all things Conformal Prediction.

But where can I find the code?

Conformal Prediction has been developing rapidly and many new libraries recently appeared on the scene with more libraries added in 2022. There are many great libraries in Python, R and even Julia and C++.

Recently a new Python library called crepes: Conformal Regressors and Predictive Systems was released.

Crepes is a Python library for generating conformal regressors. Conformal regressors transform point predictions of any underlying regression model into prediction intervals for specified levels of confidence.

The package also implements Conformal Predictive Systems, which transform point predictions into cumulative distribution functions.

As usual, after installing the library with ‘pip install crepes’

from crepes import ConformalRegressor
from crepes.fillings import sigma_knn, binning

Step 1. One trains the underlying model, in this case, the popular Random Forest that turns to output good results without tuning out of the box.

learner = RandomForestRegressor() 
learner.fit(X_prop_train, y_prop_train)
y_hat_test = learner.predict(X_test)

Step 2. The learned model is applied to the calibration objects and residuals are calculated

y_hat_cal = learner.predict(X_cal)
residuals_cal = y_cal - y_hat_cal

Step 3. We can now apply the conformal regressor to get prediction intervals for the test objects, using the point predictions as input, where the probability of not including the correct target in an interval is 1-confidence:

std_intervals = cr_std.predict(y_hat=y_hat_test, confidence=0.99)
intervals_std = cr_std.predict(y_hat=y_hat_test, y_min=0)

This will output prediction intervals, to normalize the intervals and make them more informative by tailoring each prediction interval band to account for the complexity of each individual instance we can use different methods, in this example, we will use the helper function sigma_knn for this purpose. The function estimates the difficulty by the mean absolute errors of the k (default k=5) nearest neighbours to each instance in the calibration set. A small value (beta) is added to the estimates, which may be given through an argument to the function; below we just use the default, i.e., beta=0.01.

sigmas_cal = sigma_knn(X=X_cal, residuals=residuals_cal)

Step 4. The difficulty estimate can now be used for normalized conformal regressor. Creating difficulty estimates for the test set too we can obtain the prediction intervals, using the point predictions and difficulty estimates for the test set

cr_norm = ConformalRegressor()
cr_norm.fit(residuals=residuals_cal, sigmas=sigmas_cal)
sigmas_test = sigma_knn(X=X_cal, residuals=residuals_cal, X_test=X_test)
intervals_norm = cr_norm.predict(y_hat=y_hat_test, sigmas=sigmas_test)

Steps 1–4 are all it takes to compute predictive intervals that are tailored for each test point. To fit a Conformal Predictive System outputting complete CFD for each test point a few extra steps are required.

Step 5. Import Conformal Predictive System, specify the number of bins and define Conformal Predictive System. In this case state of the art Mondrian approach is used to ensure optimally smaller prediction intervals, the Mondrian approach achieves this by splitting the object space into non-overlapping Mondrian categories and forming a standard conformal regressor for each category.

from crepes import ConformalPredictiveSystem

bins_cal, bin_thresholds = binning(values=y_hat_cal, bins=5)

cps_mond_norm = ConformalPredictiveSystem().fit(residuals=residuals_cal,
sigmas=sigmas_cal,
bins=bins_cal)

Step 6. In order to use the Mondrian conformal regressor on the test objects, we need to get the labels of the Mondrian categories for these.

bins_test = binning(values=y_hat_test, bins=bin_thresholds)

Step 7. We now have everything to produce prediction intervals from the conformal predictive distributions for the test objects:

intervals = cps_mond_norm.predict(y_hat=y_hat_test,
sigmas=sigmas_test,
bins=bins_test,
lower_percentiles=2.5,
higher_percentiles=97.5,
y_min=0)

Step 8. We may request that the predict method returns the full conformal predictive distribution (CPD) for each test instance, as defined by the threshold values, by setting return_cpds=True. The format of the distributions varies with the type of conformal predictive system; for a standard and normalized CPS, the output is an array with a row for each test instance and a column for each calibration instance (residual), while for a Mondrian CPS, the default output is a vector containing one CPD per test instance, since the number of values may vary between categories.

cpds = cps_mond_norm.predict(y_hat=y_hat_test,
sigmas=sigmas_test,
bins=bins_test,
return_cpds=True)

The resulting vector of arrays is not displayed here instead we can plot CPD for a random test instance.

See crepes: Conformal Regressors and Predictive Systems for more documentation and examples

References

  1. Nonparametric predictive distributions based on conformal prediction
  2. Cross-conformal predictive distributions
  3. Computationally efficient versions of conformal predictive distributions
  4. Conformal predictive distributions with kernels
  5. Valid prediction intervals for regression problems by Nicolas Dewolf · Bernard De Baets · Willem Waegeman (2022)
  6. Prediction Intervals with Conformal Inference: An Intuitive Explanation by Rajiv Shah (2022)
  7. Getting prediction intervals with conformal prediction. TikTok video by Rajiv Shah (2022)
  8. 'crepes: Conformal Regressors and Predictive Systems
  9. Kaggle Notebook showcasing Conformal Predictive Distributions on Playground Series Season 3, Episode 1 (California Housing data) competittion https://www.kaggle.com/code/predaddict/conformal-predictive-distributions-pss3-e1

--

--

Valeriy Manokhin, PhD, MBA, CQF
Valeriy Manokhin, PhD, MBA, CQF

Written by Valeriy Manokhin, PhD, MBA, CQF

Principal Data Scientist, PhD in Machine Learning, creator of Awesome Conformal Prediction 👍Tip: hold down the Clap icon for up x50

Responses (2)