๐ƒ๐ž๐ž๐ฉ ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐ˆ๐ฌ ๐–๐ก๐š๐ญ ๐˜๐จ๐ฎ ๐ƒ๐จ ๐๐จ๐ญ ๐๐ž๐ž๐

Valeriy Manokhin, PhD, MBA, CQF
4 min readJun 18, 2022

Unless you have tons of clean data and tens of top PhDs working on forecasting for over a decade, as Amazon and Alibaba do, and even then, would you just take claims from the same companies that sell GPU usage for granted?

November 2022 update: in a large-scale independent study by Nixtla it was confirmed that Deep Learning methods failed to outperform a simple ensemble of econometrics and statistical methods whilst resulting in x25,000 higher cost (0.05 cents for econometrics/stats ensemble vs $11,000 of GPU costs for Deep Learning).

When it comes to time series and forecasting, โ€œ๐ƒ๐ž๐ž๐ฉ ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐ˆ๐ฌ ๐–๐ก๐š๐ญ ๐˜๐จ๐ฎ ๐ƒ๐จ ๐๐จ๐ญ ๐๐ž๐ž๐.โ€

Do not let the deluge of papers about deep learning for time series (including ones produced by top tech companies and at top conferences) overwhelm you.

There is no evidence that deep learning methods, including transformers, outperform statistical and machine learning methods.

Most of the claims about the performance of deep learning come either from conflicted parties (including tech firms like Alibaba interested in selling more GPU hours) or academic labs that either by design or omission misrepresent the results of deep learning performance compared to other methods.

Such misrepresentation involves recycling the same toy datasets, dataset arbitrage (only showing results on datasets where deep learning works better), omitting non-deep learning benchmarks, not using correct benchmarks and many other tricks.

When starting the forecasting journey, donโ€™t get distracted from fundamentals.

And whoever tells you that your company needs deep learning for forecasting is often either unfamiliar with the subject of time series or has a vested interest in selling an ineffective piece of consulting advice or a forecasting technology that does not work.

The industry is littered with examples of implementations of deep learning projects that result in ineffective solutions or complete disasters at worst, where deep learning systems exploded into production.

And unless coming from Amazon and Alibaba, one never hears about successfully implemented deep learning solutions, whilst many of the top companies like Walmart, Target, and others have either unsuccessfully tried and then abandoned deep learning or otherwise built effective forecasting solutions without deep learning.

As for your average Joe Bloggs Inc multibillion super-duper company listed on cool Nasdaq or not-so-cool NYSE stock exchange, deep learning ๐๐จ๐ž๐ฌ ๐ง๐จ๐ญ ๐ฐ๐จ๐ซ๐ค ๐ง๐จ ๐ฆ๐š๐ญ๐ญ๐ž๐ซ ๐ฐ๐ก๐š๐ญ ๐ฒ๐จ๐ฎ๐ซ ๐ž๐ฑ๐ฉ๐ž๐ง๐ฌ๐ข๐ฏ๐ž ๐œ๐จ๐ง๐ฌ๐ฎ๐ฅ๐ญ๐š๐ง๐ญ๐ฌ ๐จ๐ซ not-so-deep in forecasting ๐ค๐ง๐จ๐ฐ๐ฅ๐ž๐๐ ๐ž ๐๐ก๐ƒ-๐ข๐ง-๐ข๐ซ๐ซ๐ž๐ฅ๐ž๐ฏ๐š๐ง๐ญ ๐Ÿ๐ข๐ž๐ฅ๐ ๐ƒ๐š๐ญ๐š ๐’๐œ๐ข๐ž๐ง๐ญ๐ข๐ฌ๐ญ ๐ญ๐ž๐ฅ๐ฅ๐ฌ ๐ฒ๐จ๐ฎ.

Fire your expensive consultants and PhD-in-irrelevant-field Data Scientist (hello, Zillow) and save yourself a lot of time, trouble and millions in wasted project costs and foregone profits.

DO NOT use deep learning for forecasting unless you have tons of clean data, an expert team and a lot of time to play with these toys. Do not touch deep learning until you have built an effective forecasting system that delivers business value using statistical, econometrics and machine learning tools.

โ€ FreDo: Frequency Domain-based Long-Term Time Series Forecastingโ€, a research ๐Ÿง paper from MIT, pitted super fancy transformer architecture against a simple, almost mechanical benchmark.

๐•‹๐•ƒ;๐”ปโ„ Transformer loses grotesquely

Let me repeat. A simple, almost mechanistic benchmark forecasting model from the Massachusetts Institute of Technology totally decimated sophisticated transformer-based architecture. And not one of your average transformers, the best and brightest of transformer for time series, the one that is better than another transformer, and so on.

๐•๐•–๐•ค, ๐•ช๐• ๐•ฆ ๐•™๐•’๐•ง๐•– ๐•™๐•–๐•’๐•ฃ๐•• ๐•š๐•ฅ ๐•ฃ๐•š๐•˜๐•™๐•ฅ. ๐•Š๐•š๐•ž๐•ก๐•๐•– ๐•’๐•๐•ž๐• ๐•ค๐•ฅ ๐•ž๐•–๐•”๐•™๐•’๐•Ÿ๐•š๐•ค๐•ฅ๐•š๐•” ๐•“๐•–๐•Ÿ๐•”๐•™๐•ž๐•’๐•ฃ๐•œ ๐••๐•–๐•”๐•š๐•ž๐•’๐•ฅ๐•–๐•ค ๐•ฅ๐•™๐•– ๐•ž๐•–๐•’๐•Ÿ๐•–๐•ค๐•ฅ ๐•ฅ๐•š๐•ž๐•– ๐•ค๐•–๐•ฃ๐•š๐•–๐•ค ๐•ฅ๐•ฃ๐•’๐•Ÿ๐•ค๐•—๐• ๐•ฃ๐•ž๐•–๐•ฃ. Like a bunch of kids in Michael Bayโ€™s movie won vs the meanest transformer ๐Ÿฟ

Simple mechanistic benchmark vs transformer

Like transformers? Stick to NLP. Transformers are excellent for ๐Ÿ˜Ž NLP domain, just not as awesome for #timeseries. Not at all. Please donโ€™t use them. Multiple research papers say that transformers are fundamentally unsuitable for time series for very good e never sees them winning forecasting and Kaggle competitions for good reasons.

It is not all about transformers. Using only deep learning for time series is generally not the brightest idea ๐Ÿ’ก. Even the most brilliant forecasting experts at Amazon Forecasting R&D fell into the trap of thinking that an average forecasting use case is amenable to deep learning. After the Kaggle Walmart M5 forecasting competition, they wrote a paper, โ€œLearning with treesโ€, saying they finally realised deep learning is not the go-to tool for forecasting.

The reason transformers donโ€™t work well for time series is straightforward โ€” errors accumulate in transformer-based architectures, and there is nothing the big transformer can do about it. There are other good reasons, including that NLP differs from time series, so just because something works in NLP does not mean it would work in time series. Time series are not the same as word sequences; in sentences, only context matters; in time series order of the values is what matters.

Some consolation for DeepAR and others is that they are now in the good company of deep learning models for time series that do not work so well.

ARIMA boosted trees, โ€ฆ, secret sauce less known approach [hey, itโ€™s a free Medium article did you expect freebie expert advice ๐Ÿ”‘ ] > deep learning.

Want to learn more about time series and forecasting? Follow me on Medium, Twitter and LinkedIn.

References:

  1. Statistical vs Deep Learning forecasting methods by Nixtla
  2. Python vs R for forecasting
  3. Forecasting with Trees by Amazon time series R&D lab
  4. Kaggle Walmart M5 forecasting competion

--

--

Valeriy Manokhin, PhD, MBA, CQF

Principal Data Scientist, PhD in Machine Learning, creator of Awesome Conformal Prediction ๐Ÿ‘Tip: hold down the Clap icon for up x50