# Time Series Forecasting Using Cyclic Boosting

*Generate accurate forecasts to understand how each prediction has been made.*

Machine learning for time series forecasting achieved significant successes during the last few years; machine learning methods dominated the leaderboard in the M5 Kaggle Walmart forecasting competition.

And when I say machine learning, I mean precisely this — machine learning, not deep learning. As someone who has not only witnessed deep learning forecasting systems built by less savvy data science teams blowing in production but had to fix it successfully, I will never tire of saying that ‘𝐃𝐞𝐞𝐩 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐈𝐬 𝐖𝐡𝐚𝐭 𝐘𝐨𝐮 𝐃𝐨 𝐍𝐨𝐭 𝐍𝐞𝐞𝐝’.

If you want to learn why deep learning is not the answer to time series forecasting, please read my Medium article ‘𝐃𝐞𝐞𝐩 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐈𝐬 𝐖𝐡𝐚𝐭 𝐘𝐨𝐮 𝐃𝐨 𝐍𝐨𝐭 𝐍𝐞𝐞𝐝’. You can thank me later for not straying on the wrong path and saving your company a lot of effort and money.

Many powerful machine learning methods can deliver superior forecasting performance (follow me on LinkedIn and Twitter as I often highlight new SOTA developments in time series forecasting), with both ensembles and boosted trees often surpassing other methods. However, using complex machine learning and ensemble methods results in black box models, where it is difficult to understand the path leading to individual predictions.

To address this issue, BlueYonder research has published a novel ”CyclicBoosting” machine learning algorithm. CyclicBoosting is a generic supervised machine learning model performing accurate regression and classification tasks efficiently. At the same time, CyclicBoosting (CB) allows for a detailed understanding of how each prediction was made.

Such understanding is precious in situations where stakeholders would like to know how individual predictions were made and which factors contributed to them. Understanding how individual predictions were made can also be a regulatory requirement in many industries, such as health, finance, and insurance and desirable in other industries, such as manufacturing or retail.

In addition, many machine learning algorithms struggle to learn rare events, as such events are not representative of most of the data. However, from the business perspective, accurately forecasting such events is very valuable. Imagine a retailer that sells goods that experience spikes during promotions and events such as Amazon Prime Day; underestimation of the volume of goods sold during such events can result in missed sales, customers turning elsewhere and damage brand loyalty.

CyclicBoosting is a supervised machine learning, the main idea is that each feature Xj contributes in a specific way to the prediction of the target Y. All such contributions can be computed at the granular level, and each prediction for a given observation Y_hat can be transparently interpreted by analysing how much each feature Xj contributed to such prediction. To achieve the required granularity of forecasts, CyclicBoosting performs binning of continuous features.

During training, CyclicBoosting considers each feature in turn and computes modification to prediction appropriately by adjusting factors fj,k where j is an index of the feature and k is the bin's index. This process continues until a stopping criterion is met, e.g. the maximum number of iterations or no further improvement of an error metric such as the mean absolute deviation (MAD) or mean squared error (MSE).

The training proceeds as follows:

- Calculate the global average μ from all observed y across all bins k and features j.
- Initialise the factors f_k_j ←1.
- Cyclically iterate through features j= 1,…,p and calculate in turn for each bin k the partial factors g and corresponding aggregated factors f, where indices t(current iteration) and τ (current or preceding iteration) refer to iterations of full feature cycles as the training of the algorithm progresses.

Confused with lots of math formulas? Let’s illustrate with a simple example ctly is happening here. Imagine it is a hot day 🔥🔥🔥🔥🔥 and you would like to predict sales of ice cream given the temperature.

Suppose we have a dataset with one feature: `Temperature`

and a target variable `Ice Cream Sales`

. The `Temperature`

feature has three possible values (Low, Medium, and High).

The first step in the Cyclic Boosting algorithm is to calculate the global mean of the target variable `Ice Cream Sales`

. This is done by taking the average of all the observed values of `Ice Cream Sales`

in the training dataset.

Next, the algorithm estimates the weights for each bin of the `Temperature`

feature. This is done by dividing the data into bins based on the values of the feature and calculating the average target value for each bin. For example, suppose we have the following data:

The global mean of icecream sales would be calculated as (10+12+14+11+13+15)/6 = 12.5

Next, the algorithm would estimate the weights for each bin of the `Temperature`

feature. For example, it might estimate a weight of 0.8 for bin Low, a weight of 1.0 for the bin Medium, and a weight of 1.2 for the bin High. These weights are calculated based on how much the average target value for each bin differs from the global mean, which is essentially what the long math formula means.

To predict a new data point with `Temperature`

= High, the algorithm would multiply the global mean of `Ice Cream Sales`

by the weight estimate for the bin High (remember it is a hot 🔥🔥🔥🔥🔥 day) of the feature `Temperature`

(1.2). The resulting prediction would be 12.5 * 1.2 = 15.

This is just a simple example to illustrate how Cyclic Boosting could be used to predict ice cream sales based on temperature alone. In practice, more features could be added to improve the accuracy of the predictions.

After doing the usual preprocessing and creating features to turn the time series problem into a supervised machine learning problem (remember CB is a generic supervised machine learning model, not a time series model as such), we can fit CB with the usual scikit-learn-based functionality.

`# model training`

model = cbm.CBM()

model.fit(x_train_df, y_train)

You can then make predictions

`# test on train error`

y_pred_train = model.predict(x_train_df).flatten()

print('RMSE', mean_squared_error(y_pred_train, y_train, squared=False))

Finally, and this is where the key benefit of using Cyclic Boosting comes from, you can produce important plots showing how each factor and its level has contributed to predictions.

Additional materials: