๐๐๐๐ฉ ๐๐๐๐ซ๐ง๐ข๐ง๐ ๐๐ฌ ๐๐ก๐๐ญ ๐๐จ๐ฎ ๐๐จ ๐๐จ๐ญ ๐๐๐๐
May-2024 update: since the article was written, there has been significant progress in time series forecasting with many deep learning configurations delivering excellent results. In mid-2024, the article might no longer reflect SOTA in deep learning for time series forecasting but the article conclusions still hold for pre-2022 architectures such as DeepAR etc.
Unless you have tons of clean data and tens of top PhDs working on forecasting for over a decade, as Amazon and Alibaba do, and even then, would you just take claims from the same companies that sell GPU usage for granted?
November 2022 update: in a large-scale independent study by Nixtla it was confirmed that Deep Learning methods failed to outperform a simple ensemble of econometrics and statistical methods whilst resulting in x25,000 higher cost (0.05 cents for econometrics/stats ensemble vs $11,000 of GPU costs for Deep Learning).
When it comes to time series and forecasting, โ๐๐๐๐ฉ ๐๐๐๐ซ๐ง๐ข๐ง๐ ๐๐ฌ ๐๐ก๐๐ญ ๐๐จ๐ฎ ๐๐จ ๐๐จ๐ญ ๐๐๐๐.โ
Do not let the deluge of papers about deep learning for time series (including ones produced by top tech companies and at top conferences) overwhelm you.
There is no evidence that deep learning methods, including transformers, outperform statistical and machine learning methods.
Most of the claims about the performance of deep learning come either from conflicted parties (including tech firms like Alibaba interested in selling more GPU hours) or academic labs that either by design or omission misrepresent the results of deep learning performance compared to other methods.
Such misrepresentation involves recycling the same toy datasets, dataset arbitrage (only showing results on datasets where deep learning works better), omitting non-deep learning benchmarks, not using correct benchmarks and many other tricks.
And whoever tells you that your company needs deep learning for forecasting is often either unfamiliar with the subject of time series or has a vested interest in selling an ineffective piece of consulting advice or a forecasting technology that does not work.
The industry is littered with examples of implementations of deep learning projects that result in ineffective solutions or complete disasters at worst, where deep learning systems exploded into production.
And unless coming from Amazon and Alibaba, one never hears about successfully implemented deep learning solutions, whilst many of the top companies like Walmart, Target, and others have either unsuccessfully tried and then abandoned deep learning or otherwise built effective forecasting solutions without deep learning.
As for your average Joe Bloggs Inc multibillion super-duper company listed on cool Nasdaq or not-so-cool NYSE stock exchange, deep learning ๐๐จ๐๐ฌ ๐ง๐จ๐ญ ๐ฐ๐จ๐ซ๐ค ๐ง๐จ ๐ฆ๐๐ญ๐ญ๐๐ซ ๐ฐ๐ก๐๐ญ ๐ฒ๐จ๐ฎ๐ซ ๐๐ฑ๐ฉ๐๐ง๐ฌ๐ข๐ฏ๐ ๐๐จ๐ง๐ฌ๐ฎ๐ฅ๐ญ๐๐ง๐ญ๐ฌ ๐จ๐ซ not-so-deep in forecasting ๐ค๐ง๐จ๐ฐ๐ฅ๐๐๐ ๐ ๐๐ก๐-๐ข๐ง-๐ข๐ซ๐ซ๐๐ฅ๐๐ฏ๐๐ง๐ญ ๐๐ข๐๐ฅ๐ ๐๐๐ญ๐ ๐๐๐ข๐๐ง๐ญ๐ข๐ฌ๐ญ ๐ญ๐๐ฅ๐ฅ๐ฌ ๐ฒ๐จ๐ฎ.
Fire your expensive consultants and PhD-in-irrelevant-field Data Scientist (hello, Zillow) and save yourself a lot of time, trouble and millions in wasted project costs and foregone profits.
DO NOT use deep learning for forecasting unless you have tons of clean data, an expert team and a lot of time to play with these toys. Do not touch deep learning until you have built an effective forecasting system that delivers business value using statistical, econometrics and machine learning tools.
โ FreDo: Frequency Domain-based Long-Term Time Series Forecastingโ, a research ๐ง paper from MIT, pitted super fancy transformer architecture against a simple, almost mechanical benchmark.
๐๐;๐ปโ Transformer loses grotesquely
Let me repeat. A simple, almost mechanistic benchmark forecasting model from the Massachusetts Institute of Technology totally decimated sophisticated transformer-based architecture. And not one of your average transformers, the best and brightest of transformer for time series, the one that is better than another transformer, and so on.
๐๐๐ค, ๐ช๐ ๐ฆ ๐๐๐ง๐ ๐๐๐๐ฃ๐ ๐๐ฅ ๐ฃ๐๐๐๐ฅ. ๐๐๐๐ก๐๐ ๐๐๐๐ ๐ค๐ฅ ๐๐๐๐๐๐๐๐ค๐ฅ๐๐ ๐๐๐๐๐๐๐๐ฃ๐ ๐๐๐๐๐๐๐ฅ๐๐ค ๐ฅ๐๐ ๐๐๐๐๐๐ค๐ฅ ๐ฅ๐๐๐ ๐ค๐๐ฃ๐๐๐ค ๐ฅ๐ฃ๐๐๐ค๐๐ ๐ฃ๐๐๐ฃ. Like a bunch of kids in Michael Bayโs movie won vs the meanest transformer ๐ฟ
Like transformers? Stick to NLP. Transformers are excellent for ๐ NLP domain, just not as awesome for #timeseries. Not at all. Please donโt use them. Multiple research papers say that transformers are fundamentally unsuitable for time series for very good e never sees them winning forecasting and Kaggle competitions for good reasons.
It is not all about transformers. Using only deep learning for time series is generally not the brightest idea ๐ก. Even the most brilliant forecasting experts at Amazon Forecasting R&D fell into the trap of thinking that an average forecasting use case is amenable to deep learning. After the Kaggle Walmart M5 forecasting competition, they wrote a paper, โLearning with treesโ, saying they finally realised deep learning is not the go-to tool for forecasting.
The reason transformers donโt work well for time series is straightforward โ errors accumulate in transformer-based architectures, and there is nothing the big transformer can do about it. There are other good reasons, including that NLP differs from time series, so just because something works in NLP does not mean it would work in time series. Time series are not the same as word sequences; in sentences, only context matters; in time series order of the values is what matters.
Some consolation for DeepAR and others is that they are now in the good company of deep learning models for time series that do not work so well.
ARIMA boosted trees, โฆ, secret sauce less known approach [hey, itโs a free Medium article did you expect freebie expert advice ๐ ] > deep learning.
Want to learn more about time series and forecasting? Follow me on Medium, Twitter and LinkedIn.
References: