Conformal Prediction forecasting with Nixtla’s statsforecast

5 min readAug 22, 2023

Conformal Prediction is a framework designed to quantify uncertainty, and it’s quickly becoming a favoured approach in both the corporate world and academic research. Its ease of application sets it apart; with minimal adjustments, it can be integrated into any predictive model.

This allows for the generation of prediction intervals that are accurately calibrated. Unlike any other uncertainty quantification methods, Conformal Prediction produces intervals with a precise level of coverage defined by the user. For instance, a 95% prediction interval generated using this method will encapsulate 95% or more data points in out-of-sample prediction intervals.

The quest for reliable and interpretable prediction intervals has become increasingly pivotal in predictive modelling. Enter Conformal Prediction, a paradigm that has garnered significant attention for its promise in this realm. Born from Kolmogorov’s complexity, Conformal Prediction is a machine-learning framework that furnishes predictions with a measure of their trustworthiness.

At its core, Conformal Prediction is about associating each prediction with a confidence level, ensuring that the error of the predictions falls outside the confidence interval only a fraction of the time that corresponds to (1-confidence level). In other words, if we predict with 95% confidence, we can be 95% sure that the actual value will lie within that prediction interval.

This is especially crucial in fields like finance, healthcare, and energy, where the consequences of erroneous predictions can be profound.

The significance of Conformal Prediction in forecasting is manifold:

**Reliability**: Traditional prediction methods often provide point estimates, leaving users needing clarification about the reliability of these estimates. Conversely, Conformal Prediction offers a systematic way to provide prediction intervals, illuminating the range in which the actual value is likely to fall.

**Adaptability**: Conformal Prediction is distribution-free, unlike methods requiring strong assumptions about the data. This makes it versatile and adaptable to a wide range of applications and datasets.

**Transparency**: By quantifying the uncertainty associated with predictions, Conformal Prediction offers a transparent view of the model’s performance, enabling stakeholders to make more informed decisions.

**Enhanced Decision Making**: In sectors like finance or healthcare, where risk assessment is vital, having a clear understanding of prediction uncertainty can guide strategic decisions, optimising outcomes and mitigating potential pitfalls.

In an era where data-driven decision-making is paramount, the ability to quantify the trust we place in our predictions is invaluable. Conformal Prediction, with its blend of robustness, flexibility, and transparency, stands as a beacon in the vast sea of forecasting methodologies, promising a future where we not only predict but also understand the bounds of our predictions.

In my previous articles, we have looked at how one can do Conformal Prediction forecasting with MAPIE, how to do Probabilistic Forecasting with Conformal Prediction and NeuralProphet and how to produce Multi-horizon Probabilistic Forecasting with Conformal Prediction and NeuralProphet.

Conformal Prediction is making waves in the forecasting world, finding its way into numerous open-source tools. Excitingly, Nixtla has now integrated Conformal Prediction into its widely-acclaimed statsforecast library. This means users can harness the power of Conformal Prediction alongside beloved models like ARIMA, ETS, and many more, all within a single platform with a few lines of code!

In this article, we will use data from the M4 forecasting competition. A sample of time series from this dataset is shown below.

A sample of 8 time series from M4 forecasting competition

We import modules from Nixtla’s statsforecast, including ConformalIntervals required to generate conformal prediction intervals for statsforecast models.

from statsforecast.models import SeasonalExponentialSmoothing, ADIDA, ARIMA
from statsforecast.utils import ConformalIntervals

from statsforecast.models import (
    AutoETS, 
    HistoricAverage, 
    Naive, 
    RandomWalkWithDrift, 
    SeasonalNaive
)

We can create the list of models we would like to use to produce forecasts. Note that instead of standard prediction intervals produced by these models, we instruct statsforecast to use prediction intervals generated by Conformal Prediction instead to obtain good prediction intervals with specified coverage.

# Create a list of models and instantiation parameters 
intervals = ConformalIntervals(h=24, n_windows=2)

models = [
    SeasonalExponentialSmoothing(season_length=24,alpha=0.1, prediction_intervals=intervals),
    ADIDA(prediction_intervals=intervals),
    ARIMA(order=(24,0,12), season_length=24, prediction_intervals=intervals),
]

sf = StatsForecast(
    df=train, 
    models=models, 
    freq='H', 
)

We can now specify confidence levels as a list.

levels = [80, 90] # confidence levels of the prediction intervals 

forecasts = sf.forecast(h=24, level=levels)
forecasts = forecasts.reset_index()
forecasts.head()

Statsforecast produces a data frame that includes point prediction intervals produced by our models and lower and upper bounds produced by Conformal Prediction for each of the specified confidence levels.

We can select forecasts for one time series and plot them, for example, for ARIMA

id = "H105"
temp_train = train.loc[train['unique_id'] == id]
temp_forecast = forecasts.loc[forecasts['unique_id'] == id]

arima_forecast_dfarima_forecast_df = temp_forecast[['unique_id', 'ds', 'ARIMA', 'ARIMA-lo-90', 'ARIMA-hi-90']]
.head()

# Ensure data is sorted by 'ds'
temp_train = temp_train.sort_values(by='ds')
arima_forecast_df = arima_forecast_df.sort_values(by='ds')

# Let's visualise for a sample unique_id for clarity. We'll pick the first unique_id for demonstration.
sample_id = temp_train['unique_id'].iloc[0]

temp_train_sample = temp_train[temp_train['unique_id'] == sample_id]
arima_forecast_sample = arima_forecast_df[arima_forecast_df['unique_id'] == sample_id]

# Plot
plt.figure(figsize=(15, 7))
plt.plot(temp_train_sample['ds'], temp_train_sample['y'], label='Historic Data', color='blue')
plt.plot(arima_forecast_sample['ds'], arima_forecast_sample['ARIMA'], label='ARIMA Forecast', color='red', linestyle='--')
plt.fill_between(arima_forecast_sample['ds'], arima_forecast_sample['ARIMA-lo-90'], arima_forecast_sample['ARIMA-hi-90'], color='red', alpha=0.2, label='90% Prediction Interval')
plt.legend()
plt.title(f"Time Series Forecast for unique_id {sample_id}")
plt.xlabel("Timestamp")
plt.ylabel("Value")
plt.grid(True)
plt.tight_layout()
plt.show()

Prediction intervals produced by Conformal Prediction using ARIMA

To learn more about Conformal Prediction, consider my book “Practical Guide to Applied Conformal Prediction: Learn and apply the best uncertainty frameworks to your industry applications.”

References:

Conformal Prediction forecasting with Nixtla’s statsforecast

Written by Valeriy Manokhin, PhD, MBA, CQF