25. Juni 2020

A not-so-serious SARIMA forecast for running in 2020

Just for fun: what happens when you fit a SARIMA model (using statsmodel) over past Strava running activities in order to predict remaining running distance in 2020? Not much happens beyond the nourishment of my interest in predicting time series.

I once took an online course on Practical Time Series Analysis which taught how to do forecasts, when only having past values of that time series at hand. One of the time series models presented in the course was the SARIMA model (standing for the rather
bulky name “seasonal autoregressive integrated moving average model”).

I played with SARIMA models previously, when Forecasting Ozone levels in London. This time, I applied a SARIMA model to forecast future running distance and, independently, future elevation gain in 2020:

For Python, the statsmodels library offers the SARIMAX implementation for fitting SARIMA models (ARIMA(p,d,q)(P,D,Q)s). Due to a lack of time for deeper analysis (or validation by holding out a part of the data), I chose ARIMA(1,1,0)(1,1,0)s, including autoregressive and integrated elements in the seasonal and non-seasonal part of the model.

In conclusion, the statsmodels library made a good first impression. Yet it is still up to me how much I want to run in 2020.

File: strava_activity_analysis_20200625.ipynb [32.39 kB]
Download: 549