# How to predict quantiles in a more intelligent way (or ‘Bye-bye quantile regression, hello Conformalized Quantile Regression’)

The good old quantile regression. Statistical modeling of quantiles dates back Galton in the 1890s, with “modern” views on quantile regresssion going back to 1970s — the method is over 130 years old now. Surely we can do better in 2021?

What happens when one trains a machine learning model using methods such as pinball loss? Pinball loss is not an adequate objective to optimize to achieve calibration—wrong target 🎯 for algorithms to train on -> inaccurate and biased result.

What happens when one uses quantile regression in its classical form (the year 1978 edition) — well, quite simply, it doesn’t work out of sample!

𝔹𝕪𝕖-𝕓𝕪𝕖 𝕢𝕦𝕒𝕟𝕥𝕚𝕝𝕖 𝕣𝕖𝕘𝕣𝕖𝕤𝕤𝕚𝕠𝕟 — 𝕙𝕖𝕝𝕝𝕠 𝕔𝕠𝕟𝕗𝕠𝕣𝕞𝕒𝕝 𝕢𝕦𝕒𝕟𝕥𝕚𝕝𝕖 𝕣𝕖𝕘𝕣𝕖𝕤𝕤𝕚𝕠𝕟.

Bye-bye 👋 quantile regression hello 👋 Conformalized Quantile Regression (CQR).

Conformal Prediction is a flexible framework that allows data scientists not only to continue using their favourite machine learning and statistical tools 🧰 for classification and regression to obtain valid (unbiased) probabilistic predictions but also to be able to renew and bring new life to outdated, but still popular tools such as quantile regression.

Now ANYONE can switch from quantile regression to CQR to obtain valid (unbiased) quantile predictions with narrow intervals tailored to the feature region. CQR adjusts predictive intervals dynamically accordingly to account for local uncertainty in forecasts.

How great is that — not only does conformal prediction-based CQR provides rock-solid mathematical guarantees of lack of bias (as all and every tool in Conformal Prediction does — such guarantees are in-built by default automatically), but it also highlights the uncertainty in difficult-to-predict feature regions (via wider intervals).

CQR -> drop-in replacement for quantity regression. GitHub link:

This ✌️ package 📦 is a gem 💎 already available in Python 🐍.

2022 update: Since the article was written in 2021, Conformalized Quantile Regression has been implemented in industrial grade Scikit-learn compatible MAPIE library. One can find the CQR tutorial in MAPIE here.

CQR works each time and every time, and it does so by default due to in-built mathematical validity guarantees. When CQR (and other Conformal Prediction methods in general) produce a 95% prediction interval, one can be sure it is 95% by default (no Ifs and no But’s like in other methods), regardless of the data distribution, the sample size and the underlying regressor whether it is statistical, machine learning or deep learning model.

Unlike other methods for uncertainty quantification, when it comes to Conformal Prediction, 95% coverage means precisely what it says on the tin — 95%. To learn about evaluating probabilistic prediction models, please read my article How to evaluate Probabilistic Forecasts.

CRQ (and conformal prediction in general) means No parameters, No assumptions, unlike other methods. No parameters mean no bias. No assumptions (apart from the standard exchangeability assumption, which is a less strong assumption than the general IID assumption) mean one can deploy Conformal Prediction on top of any statistical or machine learning model.

One can take a favourite algorithm (Random Forest, CatBoost, anything really) — produce point predictions and then deploy conformal Prediction on top — voila, one gets a probabilistic predictor with inbuilt guarantees for FINITE samples (no infinity ♾ required).

References:

- Awesome Conformal Prediction — the most comprehensive professionally curated (by PhD holder in the field) resource on Conformal Prediction, including the best tutorials, videos, books, papers, articles, courses, websites, conferences and open-source libraries code in Python and R.
- Tutorial for Conformalized Quantile Regression (CQR)