# Earth Mover distance for time series

--

W

orking with forecasting and time series often requires evaluating time series similarities and differences. Whether one needs to cluster time series or compare forecasts with actual realisations, having robust similarity measures is useful when comparing time series data with different shapes and patterns.

Whilst many metrics such as MAPE, MAE and RMSE exist for evaluating forecasting performance, such metrics *have significant limitations as they only compare forecast values with actual values for the same points in time.*

For many applications such as demand planning, financial and energy forecasting and trading, this is insufficient as accurate evaluation of forecasting performance also requires evaluating how accurate the forecast is in terms of **the timing of the predicted values**.

Evaluating temporal mismatch is essential, especially in such applications as energy price prediction; energy providers must submit bids with energy production forecasts for specific points in time. Predicting correct values of future energy prices *is only valid if these prices are predicted correctly for specific points in time.*

In demand forecasting, even if the values are predicted correctly in terms of magnitudes, incorrect allocation across time leads to lost revenue and high costs as customer orders are not served on time, and excessive inventory storage and financing costs are unnecessarily incurred.

In the example below, we have two time series, and we are interested in understanding how close these two-time series are to each other. We will use intermittent demand time series as an example only to simplify concept explanations. Still, time series could be any other type, for example, energy consumption, energy prices or prices of some financial assets.

In the example below, the two time series match on most of the timeline, but there is a mismatch at two points in times t=8 and t=9. The first time series could be an actual demand, and time series two could be forecast for this time series.

How can we measure the performance of such a forecast to reflect how well it forecasts not only the values of the demand but also the correct timings of the demand values?

As we can see from the picture below, the consequences of mismatch are not only in the forecast (time series 2) under forecasting the value of demand at time 9 (four units vs units required to satisfy demand)…