Earth Mover distance for time series

Valeriy Manokhin, PhD, MBA, CQF
5 min readFeb 26, 2023

Working with forecasting and time series often requires evaluating time series similarities and differences. Whether one needs to cluster time series or compare forecasts with actual realisations, having robust similarity measures is useful when comparing time series data with different shapes and patterns.

Whilst many metrics such as MAPE, MAE and RMSE exist for evaluating forecasting performance, such metrics have significant limitations as they only compare forecast values with actual values for the same points in time.

For many applications such as demand planning, financial and energy forecasting and trading, this is insufficient as accurate evaluation of forecasting performance also requires evaluating how accurate the forecast is in terms of the timing of the predicted values.

Evaluating temporal mismatch is essential, especially in such applications as energy price prediction; energy providers must submit bids with energy production forecasts for specific points in time. Predicting correct values of future energy prices is only valid if these prices are predicted correctly for specific points in time.

In demand forecasting, even if the values are predicted correctly in terms of magnitudes, incorrect allocation across time leads to lost revenue and high costs as customer orders are not served on time, and excessive inventory storage and financing costs are unnecessarily incurred.

In the example below, we have two time series, and we are interested in understanding how close these two-time series are to each other. We will use intermittent demand time series as an example only to simplify concept explanations. Still, time series could be any other type, for example, energy consumption, energy prices or prices of some financial assets.

In the example below, the two time series match on most of the timeline, but there is a mismatch at two points in times t=8 and t=9. The first time series could be an actual demand, and time series two could be forecast for this time series.

How can we measure the performance of such a forecast to reflect how well it forecasts not only the values of the demand but also the correct timings of the demand values?

As we can see from the picture below, the consequences of mismatch are not only in the forecast (time series 2) under forecasting the value of demand at time 9 (four units vs units required to satisfy demand) but also in forecasting the four units one unit of time too early.

The consequences of these two mismatch components are very different — whilst a shortfall of four units will manifest itself in unsatisfied demand (and some unhappy customers), the consequence of predicting demand too early will be direct financial costs, including warehousing and financial costs required to fund inventory that was purchased too early.

Two series

How do we measure this mismatch more intelligently to reflect both types of mismatch?

There is a metric for this: Earth Mover’s Distance (EMD) (also known as Wasserstein distance). Earth Mover’s Distance is a way to compare two sets of data based on their underlying structures rather than simply their raw values.

Imagine each time series represents a pile of soil and that the EMD represents the minimum amount of work required to move the soil from one pile to the other. The “work” is defined as the amount of soil moved multiplied by the distance moved. The EMD is calculated by finding the optimal alignment between the two distributions, where the amount of soil moved is minimised. This alignment is found by solving a linear programming problem. Once the optimal alignment is found, the EMD is the sum of the product of the amount of soil moved and the distance it is moved for all pairs of points on the timeline.

In our example, mismatch happens from time = 8 onwards; as the first step, we calculate the distance matrix D that represents the "cost" of moving a unit of mass from one time series to another. In our example, we can use the absolute difference between the values as the measure of cost; we consider two time series from the moment they start to mismatch:

ts1 = np.array([0,8,0,0,6,0]) 
ts2 = np.array([4,0,0,0,6,0])

We first look for non-negative values and move values from one series to another to match two series as closely as possible.

In the first step, the first positive value for ts1 is 8, whilst the first positive value for ts2 is 4. We can move four units from ts1 at position 0 to position one at ts1. The flow value is the minimum of the two values, and the distance is 1. The four matched units are removed from both time series. After the first step, the values of the two time series become. In the second step, positive values are 4 (in position 1) and 6 (at position 4). We, therefore, move four units x 3 positions.

ts1: [0 4 0 0 6 0], ts2: [0 0 0 0 6 0]

(step)  from   to     flow     dist   total work
1. [0] [1] 4 1 4
2. [1] [4] 4 3 12
3. [4] [4] 2 0 0
----- ------
10 16

Wasserstein distance = total work / total flow
= 16/ 10
= 1.6

If you are interested in details, you can use the output below and work it independently to compare your results with the table above.

EMD program results step by step

EMD has applications in various fields, including computer vision, image and signal processing, machine learning, and bioinformatics. It is beneficial when comparing distributions that are not easily characterised by summary statistics such as means and variances but instead have complex structures that may be difficult to compare using other methods.

Overall, EMD is a powerful tool for analysing time series data. It can be used in various applications, such as signal processing, financial analysis, and environmental monitoring.

--

--

Valeriy Manokhin, PhD, MBA, CQF

Principal Data Scientist, PhD in Machine Learning, creator of Awesome Conformal Prediction 👍Tip: hold down the Clap icon for up x50