Benchmarking Facebook Prophet
Over the last year, I have written many posts explaining ๐๐ต๐ฎ๐ ๐ณ๐ฎ๐ฐ๐ฒ๐ฏ๐ผ๐ผ๐ธ #๐ฝ๐ฟ๐ผ๐ฝ๐ต๐ฒ๐ ๐ถ๐ ๐ฎ ๐ป๐ผ๐ป-๐ฝ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐บ๐ถ๐ป๐ด ๐ณ๐ผ๐ฟ๐ฒ๐ฐ๐ฎ๐๐๐ถ๐ป๐ด ๐ฎ๐น๐ด๐ผ๐ฟ๐ถ๐๐ต๐บ ๐๐ต๐ฎ๐ ๐ป๐ผ๐ ๐ผ๐ป๐น๐ ๐ฑ๐ผ๐ฒ๐ ๐ป๐ผ๐ ๐๐ผ๐ฟ๐ธ ๐ฎ๐ฐ๐ฟ๐ผ๐๐ ๐ฎ๐ป๐ ๐ฟ๐ฒ๐ฎ๐๐ผ๐ป๐ฎ๐ฏ๐น๐ฒ ๐๐ฒ๐ ๐ผ๐ณ #๐๐ถ๐บ๐ฒ๐๐ฒ๐ฟ๐ถ๐ฒ๐ ๐ฑ๐ฎ๐๐ฎ๐๐ฒ๐๐, but also ๐๐ ฝ๐ ณ๐ ด๐๐ ฟ๐ ด๐๐ ต๐ พ๐๐ ผ๐ ๐ ผ๐ พ๐๐ ๐ พ๐ ต ๐ พ๐๐ ท๐ ด๐ ๐ ต๐ พ๐๐ ด๐ ฒ๐ ฐ๐๐๐ ธ๐ ฝ๐ ถ ๐ ฐ๐ ป๐ ถ๐ พ๐๐ ธ๐๐ ท๐ ผ๐. More importantly, as explained in several posts, developers can not rectify such issues as Facebook prophet contains pathological flaws inherent in the prophetโs design itself.
Whilst academic research papers have highlighted performance issues with the prophet since 2017, the propagation of package popularity through the data science community has been fueled by ๐๐ค๐ฉ๐ ๐๐ญ๐๐๐จ๐จ๐๐ซ๐ ๐๐ก๐๐๐ข๐จ ๐๐ง๐ค๐ข ๐ฉ๐๐ ๐ค๐ง๐๐๐๐ฃ๐๐ก ๐๐๐ซ๐๐ก๐ค๐ฅ๐ข๐๐ฃ๐ฉ ๐ฉ๐๐๐ข ๐๐ช๐ฉ ๐ข๐ค๐ง๐ ๐๐ข๐ฅ๐ค๐ง๐ฉ๐๐ฃ๐ฉ๐ก๐ฎ ๐๐ฎ ๐ข๐๐ง๐ ๐๐ฉ๐๐ฃ๐ ๐ค๐ ๐ฉ๐๐ ๐ฃ๐ค๐ฃ-๐ฅ๐๐ง๐๐ค๐ง๐ข๐๐ฃ๐ ๐ฅ๐๐๐ ๐๐๐ ๐ซ๐๐ ๐๐ง๐ฉ๐๐๐ก๐๐จ ๐ค๐ฃ ๐๐๐๐๐ช๐ข ๐๐ฃ๐ ๐จ๐ค๐๐๐๐ก ๐ข๐๐๐๐.
Therefore, it was interesting to test the inflated claims made by some of such articles. In the Medium article https://towardsdatascience.com/predicting-the-future-with-facebook-s-prophet-bdfe11af10ff, the author went so far as to claim that #facebookprophet can โpredict the future and used the package to predict Medium stats.
With several quality #forecasting packages released over the past year, it is relatively easy for anyone to test such claims by running simple algorithms in #PyCaretโs time series over the same simple dataset of the Medium writerโs stats.
In the first test above, false prophet failed to outperform many simple algorithms on ALL point benchmarks such as MAE, RMSE, MAPE and SMAPE. Elementary methods such as linear regression, Lasso and KNN easily beat prophet.
More importantly, in terms of probabilistic forecasting, prophet ๐๐จ ๐๐ซ๐๐ฃ ๐ฌ๐ค๐ง๐จ๐ ๐๐จ ๐๐ฉ ๐ฅ๐ง๐ค๐ซ๐๐๐๐จ ๐๐ฃ๐๐ค๐ง๐ง๐๐๐ฉ ๐๐ฃ๐ ๐๐ง๐ค๐จ๐จ๐ก๐ฎ ๐ข๐๐จ๐ก๐๐๐๐๐ฃ๐ โ๐ฅ๐ง๐๐๐๐๐ฉ๐๐ค๐ฃ ๐๐ฃ๐ฉ๐๐ง๐ซ๐๐ก๐จโ. A well-calibrated probabilistic predictor will output a prediction interval covering 90%โ95% of the points. In this plot produced on a relatively simple and reasonably stable dataset, facebook prophet had 30%-40% points NOT covered by what is supposed to be โgood prediction intervals.โ One can make only one conclusion โ facebook prophet is a grossly inaccurate point predictor. It also produces wildly inaccurate PIs (prediction intervals), leaving users at risk of making wrong decisions, especially in critical, high-stakes applications.
One can argue that the Medium views data in March-19 were somewhat unstable, so prophet was given another test drive on Feb-19 data that is more stable.
As expected, it has failed to outperform other relatively simple forecasting algorithms. And again, it had produced terrible probabilistic predictions.
One has to question the utility of an algorithm that produces wildly inaccurate point and grossly inaccurate probabilistic predictions, making it extremely risky to use in any decision-making.