Transformers Are What You Do Not Need

In my previous article, “Deep Learning Is What You Do Not Need”, I explained that deep learning for time series is not the solution that most companies need even to consider; in this article, we delve deeper into more good reasons why transformers are not the solution that is effective for time series.

Transformers Are What You Do Not Need

The article was published in June 2022 and was a rather prescient call. Since then, many scientific papers have shown that transformers are not the answer for time series forecasting.

In August 2022, a compelling research paper, "Are Transformers Effective for Time Series Forecasting?” was published by a collaborative team of researchers from the Chinese University of Hong Kong and the International Digital Academy (IDEA). This paper emerged as a significant contribution to the ongoing discourse in the field of time series forecasting, particularly regarding the usage of Transformer-based models.

The study was conducted in response to a notable increase in the proposed solutions that utilised Transformers for long-term time series forecasting (LTSF). This trend in the research community has gained substantial momentum, with more and more researchers turning to Transformer-based models in their attempts to improve forecasting accuracy.

However, the researchers from the Chinese University of Hong Kong and IDEA questioned the validity of this growing research direction. They raised concerns about the efficacy of Transformers in the specific context of LTSF and challenged the assumption that these models were inherently suited to this task. Their work served as a critique and a call for reevaluation, urging the research community to scrutinise the evidence supporting the widespread adoption of Transformer models in time series forecasting.

Let’s consider the limitations of Transformers in Time Series Forecasting.

Temporal Information Loss

Transformers have been a breakthrough in many areas of machine learning, notably in natural language processing. However, when applied to time series forecasting, one of the significant issues is the loss of temporal information due to the permutation-invariant self-attention mechanism of Transformers.

Valeriy Manokhin, PhD, MBA, CQF

Principal Data Scientist, PhD in Machine Learning, creator of Awesome Conformal Prediction 👍Tip: hold down the Clap icon for up x50