Benchmarking Facebook Prophet

3 min readNov 28, 2021

Over the last year, I have written many posts explaining 𝘁𝗵𝗮𝘁 𝗳𝗮𝗰𝗲𝗯𝗼𝗼𝗸 #𝗽𝗿𝗼𝗽𝗵𝗲𝘁 𝗶𝘀 𝗮 𝗻𝗼𝗻-𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗶𝗻𝗴 𝗳𝗼𝗿𝗲𝗰𝗮𝘀𝘁𝗶𝗻𝗴 𝗮𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺 𝘁𝗵𝗮𝘁 𝗻𝗼𝘁 𝗼𝗻𝗹𝘆 𝗱𝗼𝗲𝘀 𝗻𝗼𝘁 𝘄𝗼𝗿𝗸 𝗮𝗰𝗿𝗼𝘀𝘀 𝗮𝗻𝘆 𝗿𝗲𝗮𝘀𝗼𝗻𝗮𝗯𝗹𝗲 𝘀𝗲𝘁 𝗼𝗳 #𝘁𝗶𝗺𝗲𝘀𝗲𝗿𝗶𝗲𝘀 𝗱𝗮𝘁𝗮𝘀𝗲𝘁𝘀, but also 🆄🅽🅳🅴🆁🅿🅴🆁🅵🅾🆁🅼🆂 🅼🅾🆂🆃 🅾🅵 🅾🆃🅷🅴🆁 🅵🅾🆁🅴🅲🅰🆂🆃🅸🅽🅶 🅰🅻🅶🅾🆁🅸🆃🅷🅼🆂. More importantly, as explained in several posts, developers can not rectify such issues as Facebook prophet contains pathological flaws inherent in the prophet’s design itself.

Whilst academic research papers have highlighted performance issues with the prophet since 2017, the propagation of package popularity through the data science community has been fueled by 𝙗𝙤𝙩𝙝 𝙚𝙭𝙘𝙚𝙨𝙨𝙞𝙫𝙚 𝙘𝙡𝙖𝙞𝙢𝙨 𝙛𝙧𝙤𝙢 𝙩𝙝𝙚 𝙤𝙧𝙞𝙜𝙞𝙣𝙖𝙡 𝙙𝙚𝙫𝙚𝙡𝙤𝙥𝙢𝙚𝙣𝙩 𝙩𝙚𝙖𝙢 𝙗𝙪𝙩 𝙢𝙤𝙧𝙚 𝙞𝙢𝙥𝙤𝙧𝙩𝙖𝙣𝙩𝙡𝙮 𝙗𝙮 𝙢𝙖𝙧𝙠𝙚𝙩𝙞𝙣𝙜 𝙤𝙛 𝙩𝙝𝙚 𝙣𝙤𝙣-𝙥𝙚𝙧𝙛𝙤𝙧𝙢𝙞𝙣𝙜 𝙥𝙖𝙘𝙠𝙖𝙜𝙚 𝙫𝙞𝙖 𝙖𝙧𝙩𝙞𝙘𝙡𝙚𝙨 𝙤𝙣 𝙈𝙚𝙙𝙞𝙪𝙢 𝙖𝙣𝙙 𝙨𝙤𝙘𝙞𝙖𝙡 𝙢𝙚𝙙𝙞𝙖.

Pip install prophet — the solution to any forecasting accuracy issue

Therefore, it was interesting to test the inflated claims made by some of such articles. In the Medium article https://towardsdatascience.com/predicting-the-future-with-facebook-s-prophet-bdfe11af10ff, the author went so far as to claim that #facebookprophet can ‘predict the future and used the package to predict Medium stats.

With several quality #forecasting packages released over the past year, it is relatively easy for anyone to test such claims by running simple algorithms in #PyCaret’s time series over the same simple dataset of the Medium writer’s stats.

In the first test above, false prophet failed to outperform many simple algorithms on ALL point benchmarks such as MAE, RMSE, MAPE and SMAPE. Elementary methods such as linear regression, Lasso and KNN easily beat prophet.

More importantly, in terms of probabilistic forecasting, prophet 𝙞𝙨 𝙚𝙫𝙚𝙣 𝙬𝙤𝙧𝙨𝙚 𝙖𝙨 𝙞𝙩 𝙥𝙧𝙤𝙫𝙞𝙙𝙚𝙨 𝙞𝙣𝙘𝙤𝙧𝙧𝙚𝙘𝙩 𝙖𝙣𝙙 𝙜𝙧𝙤𝙨𝙨𝙡𝙮 𝙢𝙞𝙨𝙡𝙚𝙖𝙙𝙞𝙣𝙜 ‘𝙥𝙧𝙚𝙙𝙞𝙘𝙩𝙞𝙤𝙣 𝙞𝙣𝙩𝙚𝙧𝙫𝙖𝙡𝙨’. A well-calibrated probabilistic predictor will output a prediction interval covering 90%–95% of the points. In this plot produced on a relatively simple and reasonably stable dataset, facebook prophet had 30%-40% points NOT covered by what is supposed to be ‘good prediction intervals.’ One can make only one conclusion — facebook prophet is a grossly inaccurate point predictor. It also produces wildly inaccurate PIs (prediction intervals), leaving users at risk of making wrong decisions, especially in critical, high-stakes applications.

One can argue that the Medium views data in March-19 were somewhat unstable, so prophet was given another test drive on Feb-19 data that is more stable.

As expected, it has failed to outperform other relatively simple forecasting algorithms. And again, it had produced terrible probabilistic predictions.

One has to question the utility of an algorithm that produces wildly inaccurate point and grossly inaccurate probabilistic predictions, making it extremely risky to use in any decision-making.

Benchmarking Facebook Prophet

Written by Valeriy Manokhin, PhD, MBA, CQF

Responses (1)