What Truly Works in Time Series Forecasting — The Results from Nixtla’s Mega Study

Valeriy Manokhin, PhD, MBA, CQF
3 min readSep 12, 2023

Time series is a captivating domain where the quest for a crystal ball never ceases.

Uncovering the best forecasting techniques has always been a pursuit in the field. While many regard understanding effective methods as the Holy Grail, it’s equally vital to identify those that fall short — take Facebook Prophet as a case in point.

For four decades, the mantra in applied forecasting has been ‘simpler methods prevail,’ influenced largely by the results of the M series forecasting competitions.

These competitions saw minimal machine learning participation, even from their organizers. Despite the M4 forecasting competition’s (2018) top two solutions being machine learning-based, the organizers remained stubbornly anti-machine learning, suggesting it’s ‘still up in the air’ whether machine learning surpasses traditional techniques like exponential smoothing in time series forecasting.

A few years after the conclusion of the M4 competition, the organizers shifted the subsequent M5 competition to Kaggle. For the first time, this move introduced the ‘M-competitions’ to machine learning. The outcome? A resounding upheaval of the longstanding academic forecasting beliefs, with undeniable proof — every top solution relied on machine learning — highlighting machine learning as the future of time-series forecasting.

Roll forward to 2023, with Nixtla publishing the results of the first mega study based on a dataset containing 100 billion time series points.

The results from Nixtla’s TimeGPT mega study

Nixtla’s study and results are proprietary and can not be reproduced, so we will take them as they are.

What new insights does the study bring? We will use Monash Time Series Forecasting Repository which has so far been the best publicly available reproducible study https://forecastingdata.org/

Monthly data:

Monash Time Series Forecasting Repository Insights: While the repository ranks ETS and TBATS as top methods for monthly data, Nixtla’s research added a fresh perspective by incorporating extensive machine and deep learning methods. Their findings revealed that NHITs closely followed TimeGPT, with negligible performance differences for monthly datasets.

Traditional models like Theta (M3 competition’s victor) and its variant DOTheta exhibited strong results. DeepAR did not add value by only slightly outpacing SeasonalNaive, a complimentary benchmark, while ETS’s edge over SeasonalNaive was minimal. Nixtla’s research did not feature strong benchmark TBATS despite its known efficacy with monthly data.

Weekly data:

Monash Time Series Forecasting Repository Highlights: NBEATS and TBATS emerge as top contenders. Intriguingly, Nixtla omitted TBATS from their research.

Nixtla’s Findings: Surprisingly, NBEATS is also absent from their study. NHITs nearly matched TimeGPT’s performance on weekly data sets, showcasing minimal differences. DeepAR, once more, showed minimal enhancements compared to the straightforward seasonal naive benchmark. It’s a letdown that other models lagged in weekly data performance, and the absence of TBATS (known for its efficacy in monthly and weekly forecasting) in Nixtla’s study is particularly noteworthy.

Daily data:

Monash Time Series Forecasting Repository Highlights: TBATS is the top contender. Intriguingly, Nixtla omitted TBATS from their research.

Nixtla’s Findings:

NHITs nearly matched TimeGPT’s performance on weekly data sets, showcasing minimal differences. DeepAR did better on daily data but could not outperform a variation of the simple model Theta DOTheata; once again Theta and DOTheta performed very well.

Hourly data:

Monash Time Series Forecasting Repository Highlights: There needs to be clear evidence as insufficient hourly data, albeit Pooled Regression is mentioned as the winning solution for one dataset.

As confirmed by Nixtla, the absence of TBATS, ARIMA and NBEATs are not a reflection on their performance as these methods were not included into this study.

Nixtla’s Findings: LGBM has significantly outperformed TimeGPT and all other models. NHITS was a strong solution again

References:

  1. TimeGPT1 — https://arxiv.org/abs/2310.03589
  2. Monash Time Series Forecasting Repository https://forecastingdata.org/

--

--

Valeriy Manokhin, PhD, MBA, CQF
Valeriy Manokhin, PhD, MBA, CQF

Written by Valeriy Manokhin, PhD, MBA, CQF

Principal Data Scientist, PhD in Machine Learning, creator of Awesome Conformal Prediction 👍Tip: hold down the Clap icon for up x50

Responses (4)