Jackknife+ — a Swiss knife of Conformal Prediction for regression
A while back in my article ‘How to predict quantiles in a more intelligent way (or ‘Bye-bye quantile regression, hello Conformalized Quantile Regression’) I have covered Conformalized Quantile Regression — one of the most popular Conformal Prediction models. Today, we will look at another powerful Conformal Prediction model for quantifying uncertainty in regression — the Jackknife+.
Jackknife+ is a powerful Conformal Prediction method that was developed by the leading machine learning researchers from University of Chicago, Stanford, Carnegie Mellon and Berkeley and published in the paper ‘Predictive inference with the jackknife+’.
The jackknife technique was originally conceived by Maurice Quenouille between 1949 and 1956. Later in 1958, John Tukey, a renowned statistician, further refined this technique. He proposed the term ‘jackknife’ as a metaphor for this method due to its wide applicability and versatility. Much like a physical jackknife — a compact, foldable knife — this statistical method offers a flexible and adaptable solution to a broad spectrum of problems, despite the existence of other tools that might solve specific problems more efficiently.
Our objective is to establish a regression function using the training data, which consists of feature pairs (Xi, Yi). We seek to predict the output Yn+1 for a new feature vector Xn+1=x and generate a respective uncertainty interval for this prediction. The intent is for this interval to include the true Yn+1 with a predefined coverage probability.
A straightforward approach might be to fit the underlying regression model to the training data, compute the residuals, and then use these residuals to estimate the quantile. This quantile could then be used to determine the width of the prediction interval for the new test point. However, this method tends to underestimate the actual uncertainty because of overfitting — the residuals derived from the training set are usually smaller than those you would get from unseen test data.
To mitigate the issue of overfitting, a robust statistical technique named ‘jackknife’ was devised. The original purpose of this technique was to reduce bias and provide an estimate for variance. It operates on the principle of sequentially excluding one observation from the dataset and re-estimating the model. This methodology provides an empirical means to assess the stability of the model and its sensitivity to individual data points.
The jackknife procedure is illustrated in the figure below.
In each instance of jackknife regression, the model is fit to all data points excluding the pair (Xi, Yi). This allows the jackknife regression to compute the leave-one-out residual. By interpreting these leave-one-out residuals as nonconformity scores, we can estimate a quantile and establish prediction intervals, much like in the process of Inductive Conformal Prediction. As this method tackles overfitting by utilizing out-of-sample residuals for computation, it is theoretically expected to provide adequate coverage.
However, in practice, the coverage properties of the jackknife procedure can falter, especially when dealing with highly skewed data or data with heavy tails. While there are findings suggesting successful outcomes under asymptotic settings or when the jackknife regression algorithm exhibits stability, it is evident that the jackknife method can lose its predictive coverage in situations where the estimator is unstable.
In some scenarios, the jackknife method might even deliver zero coverage, implying that the estimated prediction interval fails to contain the true value. Additionally, it’s worth noting that the jackknife method can be computationally taxing, as it necessitates fitting the model multiple times on reduced datasets.
These limitations led to the development of a refined method known as jackknife+. This improved version seeks to enhance the coverage properties and computational efficiency of the traditional jackknife method. It is part of the Conformal Prediction family of methods, thereby inheriting all their advantageous features, such as guaranteeing validity even in final samples of any size, possessing a distribution-free nature, and being applicable to any regression model.
What distinguishes the jackknife and jackknife+ methods in terms of constructing predictive intervals is the latter’s use of leave-one-out predictions at the test point. This is done to account for the variability in the fitted regression function, supplementing the quantiles of leave-one-out residuals utilized by the jackknife method. This enhancement allows the jackknife+ method to provide robust coverage guarantees, regardless of the data points’ distribution and for any algorithm treating the training points symmetrically. Conversely, the original jackknife method doesn’t offer such theoretical guarantees without stability assumptions and may have poor coverage properties in certain situations.
The jackknife+ method shares some similarities with the cross-conformal prediction proposed by Vovk, as both aim to construct predictive intervals that offer robust coverage guarantees, regardless of the distribution of the data points. They both work with any algorithm that treats the training points symmetrically.
However, the jackknife+ method differentiates itself by utilizing leave-one-out predictions at the test point to account for the variability in the fitted regression function, in addition to using quantiles of leave-one-out residuals, like cross-conformal prediction.
The figure below illustrates the difference between prediction intervals produced by the jackknife and jackknife+ methods. Unlike the jackknife method, which centers prediction intervals around a prediction made on a test point using the entire dataset, jackknife+ generates ’n’ point estimates. Around these estimates, leave-one-out prediction intervals are constructed. Therefore, jackknife+ applies quantiles to these ’n’ individual prediction intervals, which have been constructed using the jackknife+ procedure, instead of applying them to leave-one-out residuals like the jackknife method.
Jackknife+ is available in Conformal Prediction open source library MAPIE.
NB: pictures are from the jackknife paper and Aaditya Ramdas talk.
References:
- Awesome Conformal Prediction — the most comprehensive professionally curated (by PhD holder in the field) resource on Conformal Prediction, including the best tutorials, videos, books, papers, articles, courses, websites, conferences and open-source libraries code in Python and R.
- Predictive inference with the jackknife+
- Assumption-free prediction intervals for black-box regression algorithms (Aaditya Ramdas)