๐๐๐๐ฉ ๐๐๐๐ซ๐ง๐ข๐ง๐ ๐๐ฌ ๐๐ก๐๐ญ ๐๐จ๐ฎ ๐๐จ ๐๐จ๐ญ ๐๐๐๐
Unless you have tons of clean data and tens of top PhDs working on forecasting for over a decade as Amazon and Alibaba do.
When it comes to time series and forecasting, โ๐๐๐๐ฉ ๐๐๐๐ซ๐ง๐ข๐ง๐ ๐๐ฌ ๐๐ก๐๐ญ ๐๐จ๐ฎ ๐๐จ ๐๐จ๐ญ ๐๐๐๐.โ

And whoever tells you that your company needs deep learning for forecasting is either unfamiliar with the subject or has vested interests in selling an ineffective piece of consulting advice or a forecasting technology that does not work.
And for your average Joe Bloggs Inc multibillion super-duper company listed on cool Nasdaq or not-so-cool NYSE stock exchange, it ๐๐จ๐๐ฌ ๐ง๐จ๐ญ ๐ฐ๐จ๐ซ๐ค ๐ง๐จ ๐ฆ๐๐ญ๐ญ๐๐ซ ๐ฐ๐ก๐๐ญ ๐ฒ๐จ๐ฎ๐ซ ๐๐ฑ๐ฉ๐๐ง๐ฌ๐ข๐ฏ๐ ๐๐จ๐ง๐ฌ๐ฎ๐ฅ๐ญ๐๐ง๐ญ๐ฌ ๐จ๐ซ ๐ง๐จ๐ญ-๐ฌ๐จ-eep in forecasting ๐ค๐ง๐จ๐ฐ๐ฅ๐๐๐ ๐ ๐๐ก๐-๐ข๐ง-๐ข๐ซ๐ซ๐๐ฅ๐๐ฏ๐๐ง๐ญ ๐๐ข๐๐ฅ๐๐ฌ ๐๐๐ญ๐ ๐๐๐ข๐๐ง๐ญ๐ข๐ฌ๐ญ ๐ญ๐๐ฅ๐ฅ๐ฌ ๐ฒ๐จ๐ฎ.
Fire your expensive consultants and your PhD-in-irrelevant-field Data Scientist and save yourself a lot of time, trouble and millions in wasted costs and foregone profits. DO NOT use deep learning for forecasting unless you have tons of clean data.
โ FreDo: Frequency Domain-based Long-Term Time Series Forecastingโ, a new research ๐ง paper from MIT, pits super fancy transformer architecture against a simple, almost mechanical benchmark.
๐๐;๐ปโ ๐๐ฃ๐๐๐๐ ๐ฃ๐๐๐ฃ ๐๐ ๐ค๐๐ค ๐๐ฃ๐ ๐ฅ๐๐ค๐ข๐ฆ๐๐๐ช.
A simple, almost mechanistic benchmark forecasting model from Massachusetts Institute of Technology totally decimates sophisticated transformer-based architecture. And not one of your average transformers, the best and brightest of them transformer for time series, the one that is better than another transformer that in turn is better than another transformer and so on.
๐๐๐ค, ๐ช๐ ๐ฆ ๐๐๐ง๐ ๐๐๐๐ฃ๐ ๐๐ฅ ๐ฃ๐๐๐๐ฅ. ๐๐๐๐ก๐๐ ๐๐๐๐ ๐ค๐ฅ ๐๐๐๐๐๐๐๐ค๐ฅ๐๐ ๐๐๐๐๐๐๐๐ฃ๐ ๐๐๐๐๐๐๐ฅ๐๐ค ๐ฅ๐๐ ๐๐๐๐๐๐ค๐ฅ ๐ฅ๐๐๐ ๐ค๐๐ฃ๐๐๐ค ๐ฅ๐ฃ๐๐๐ค๐๐ ๐ฃ๐๐๐ฃ. Like a bunch of kids in Michael Bayโs movie win vs meanest transformer ๐ฟ

Like transformers? Stick to NLP. Transformers are excellent for ๐ NLP domain, just not that awesome for #timeseries. Not at all. Please donโt use them.
It is not all about transformers. Using deep learning for time series is not the brightest idea ๐ก either. Even the most brilliant forecasting experts at Amazon Forecasting R&D fell into the trap of thinking that an average forecasting use case is amenable for deep learning. After the Kaggle Walmart forecasting competition
The reason transformers donโt work well for time series is straightforward โ errors accumulate in transformer-based architectures, and there is nothing the big transformer can do about it. Not yet.
A bit of consolation for DeepAR and others is that they are now in a good company of #deeplearning models for time series that do-not work so well.
ARIMA boosted trees, โฆ, secret sauce less known approach [hey, itโs a free Medium article did you expect freebie expert advice ๐ ] > deep learning.
Want fast ARIMA -> use Nixtlaโs super fast ARIMA. Save your computing time and costs [free unaffiliated pluggin] just because the Nixtla team now what they are doing and are fantastic. One canโt get 1 million models fitted in 20 minutes with any other ARIMA.
Want to learn more about time series and forecasting? Follow me on Medium, Twitter and LinkedIn.
References: