Prerequisites

Experience with the specific topic: Novice

Professional experience: Some industry experience

Some familiarity with the basic concepts of time series forecasting concepts will allow the reader to better follow this tutorial, although advanced knowledge is not required. For a good introduction to the basic concepts of forecasting, see this tutorial and this tutorial.

To follow the example, the reader should also be familiar with basic R syntax. R packages needed: forecast, prophet, bsts, ggplot2, and repr.

Introduction

In this overview, we introduce decomposition-based approaches to time series forecasting. Decomposition-based methods are a simple but robust approach to modeling and forecasting time series data. The basic idea behind these methods is that they explicitly model the data as a combination of trend, seasonal, and remainder components, rather than trying to capture temporal dependencies and auto-correlations in the data the way ARIMA or GARCH models do.

Decomposing a time series model involves splitting it into 3 or 4 components, in the form of:

time-series-decomposition

(Note that this is an additive decomposition—we will deal with the multiplicative case later).

With:
Y^(t) : The modelled/forecast value at time t
T(t) : The trend component at time t
S(t) : The seasonal component at time t
R(t) : The Remainder at time t
ε(t) : Error term

Time series decomposition is usually presented as an analysis step to be performed before generating predictions, but it can also be used as a forecasting method in and of itself if you know what the structure of your time series will look like beforehand. These scenarios frequently occur in a business context such as retail demand forecasting where it is safe to assume for some products that the sales data will have a yearly seasonal pattern and a year-over-year trend.

To forecast a time series using a decomposition model, you calculate the future values for each separate component and then add them back together to obtain a prediction. The challenge then simply becomes finding the best model for each of the components.

In the following overview, we will present three approaches to forecasting using decomposition with R: Seasonal and Trend decomposition using LOESS, Bayesian structural time series, and Facebook Prophet.

First, we will decompose the time series and forecast it using each of the three methods. Then, we will work on improving the accuracy of the forecasts.

Load The Required Libraries And Settings

# Call the required libraries 
library(ggplot2)
library(forecast)
library(bsts)
library(prophet)
library(repr)

#Choosing a theme for clear and consistent data plots 
theme_set(theme_bw())
 

Load The Data

We will use the air passengers data set which is a classic data set for benchmarking time series models first introduced by Box and Jenkins in 1976 (it is to time series forecasting what the Iris data set is to classification and regression algorithms). In particular, the air passenger time series has a very clear trend and seasonal pattern and so it is perfect for testing decomposition methods. This data set is available as part of the R base package—but it can also be downloaded here (if downloading, you'll have to use something like read.csv() to load the data and format it as a time series object).

#load the airpassengers data 
data("AirPassengers") 
plot(AirPassengers)
decomposition1
#Split the data into test and train 
train <- window(AirPassengers, end = c(1958, 12))
test  <- window(AirPassengers, start = c(1959, 1), end = c(1960,12))

Decomposition and Forecasting

Seasonal And Trend Decomposition Using LOESS (STL)

The STL was proposed by Cleveland et al. in 1990. STL uses the LOESS (LOcal regrESSion) method to model the trend and seasonal components using polynomial regression. The algorithm works by running two loops:

  • An outer loop where robustness coefficients are calculated to minimize the effect of outliers.
  • An inner loop where the trend component and seasonal component are iteratively updated using LOESS smoothing.

The algorithm can be set to run for a fixed number of iterations for each loop, or it can be set to run until a specific convergence criterion is met (in the R versions of STL presented here, we will use the default number of fixed iterations which is 2). In addition, we can specify to STL whether we want the seasonal component to remain constant over time or whether we want to change (in R, this can be specified using the parameter s.window).

To forecast with STL, we first use STL to decompose the time series into three components:time-series-components

We then apply a standard forecasting algorithm to the remainder R(t), such as ARIMA or Exponential Smoothing, and generate an h-step ahead forecast for the remainder component R(t + h).

Lastly, we calculate the h-step ahead trend component T(h) and S(+ h) and sum all three components to obtain a final forecast.

If we want to use STL for analysis only, then the STL() function that comes with the base R installation is sufficient. To use STL for forecasting, however, it is easier to use the STLF() in the forecast package, which uses the original STL() function for decomposition but also allows us to specify which algorithm to use for forecasting the remainder.

To perform a STL decomposition, you can run:

#Decompose the time series 
AirPassengersSTL <- stl(AirPassengers, s.window="periodic", robust=TRUE)
plot(AirPassengersSTL)

decomposition2

Note that you don't need to pass a number of seasons to the STL() function; it is picking it up from the frequency that is defined in the original time series object AirPassengers.

To generate forecasts, you can use STLF(), which allows you to specify which algorithm to use for forecasting the remainder. In this case, we will use ARIMA, but we don't need to specify the ARIMA orders (p,q,d)—the forecast method will find the optimal orders on its own.

#Train the STL model, using ARIMA to forecast the remainder 
AirPassengersSTL_ARIMA <- stlf(train, method="arima")
#Plot the results
options(repr.plot.width=8, repr.plot.height=4)
autoplot(train , ylab = 'Passengers') + scale_x_yearmon() + autolayer(test, series="Test Data") +
  autolayer(ts(AirPassengersSTL_ARIMA$mean,frequency=12, start=c(1959,1)), series="STL + ARIMA Forecasts")

decomposition3

Bayesian Structural Time Series

Proposed by Scott and Varian in 2013, Bayesian structural time series is a powerful set of methods that cover a large class of time series models using the State Space representation of time series and Bayesian statistics.

In the State Space approach, a time series can be written as a set of two equations, a state equation describing the overall evolution of the system in terms of unobserved states, and an observation or measurement equation describing the relationship between the observable variables and the hidden states.

bayesian-state-equationThe state equation, with α(t) the state of the system at time t.

bayesian-observation-equation : The observation equation relating the values of the time series to the hidden states.

It is straightforward to rewrite the trend and seasonal decomposition of a time series:

trend-seasonal-decomposition

(A BSTS model can also include a set of external regressors βX(t), although we don't do so here)

In state space form:

state-space-form

With:

 time-series-matrix

and Z(t) = [1 1]

And the state equation is:

 Screen Shot 2018-11-07 at 2.38.22 PM
Screen Shot 2018-11-07 at 4.31.28 PM (with S(t - i) a set of seasonal dummy variables used to model seasonality).
(i.e η(t) = (u(t),v(t),w(t)).

Kalman filtering and an MCMC algorithm are used to fit the model. Forecasts are then calculated from the posterior predictive distribution.

Let's fit a BSTS model on our data and then plot the components:


#Train the BSTS model 
Trend_Seasonal_states <- AddSemilocalLinearTrend(list(),train)
Trend_Seasonal_states <- AddSeasonal(Trend_Seasonal_states,train, nseasons = 12)
AirPassengersBSTS <- bsts(train,state.specification = Trend_Seasonal_states, niter = 1000)
=-=-=-=-= Iteration 0 Mon Jul 09 12:06:01 2018
 =-=-=-=-=
=-=-=-=-= Iteration 100 Mon Jul 09 12:06:01 2018
 =-=-=-=-=
=-=-=-=-= Iteration 200 Mon Jul 09 12:06:01 2018
 =-=-=-=-=
=-=-=-=-= Iteration 300 Mon Jul 09 12:06:01 2018
 =-=-=-=-=
=-=-=-=-= Iteration 400 Mon Jul 09 12:06:02 2018
 =-=-=-=-=
=-=-=-=-= Iteration 500 Mon Jul 09 12:06:02 2018
 =-=-=-=-=
=-=-=-=-= Iteration 600 Mon Jul 09 12:06:02 2018
 =-=-=-=-=
=-=-=-=-= Iteration 700 Mon Jul 09 12:06:02 2018
 =-=-=-=-=
=-=-=-=-= Iteration 800 Mon Jul 09 12:06:02 2018
 =-=-=-=-=
=-=-=-=-= Iteration 900 Mon Jul 09 12:06:02 2018
 =-=-=-=-=
options(repr.plot.width=6, repr.plot.height=4)
plot(AirPassengersBSTS)
options(repr.plot.width=8, repr.plot.height=4)
plot(AirPassengersBSTS, "components")

decomposition4decomposition5

To generate forecasts and plot the results:


BSTSForecasts <- predict.bsts(AirPassengersBSTS, horizon = 24)
BSTSForecastsFormatted <- ts(BSTSForecasts$mean,frequency=12, start=c(1959,1))
options(repr.plot.width=8, repr.plot.height=4)
autoplot(train , ylab = 'Passengers') + scale_x_yearmon() + autolayer(test, series="Test Data") +
  autolayer(BSTSForecastsFormatted, series="BSTS Forecasts")

decomposition6

Facebook Prophet

Prophet is a forecasting method developed at Facebook that is open source. It is essentially a curve fitting approach, very similar in spirit to how BSTS models trend and seasonality, except that it uses generalized additive models instead of a state-space representation to describe each component.

Similar to the other two approaches, a basic prophet model is written as:

Screen Shot 2018-11-07 at 3.27.47 PM

with T(t) the trend, S(T) the seasonality and H(T) an additional component to represent holiday effects.

The trend is modeled either as a logistic growth model for time series with saturated growth:

Screen Shot 2018-11-07 at 3.31.07 PM with C the capacity of the model (i.e. maximum level of the time series).

Or a piece-wise linear growth model for unbounded growths:

Screen Shot 2018-11-07 at 3.31.55 PM

Changes in trend are modeled using change points in the growth rate k.

The seasonal component is modeled using a Fourier series:

Screen Shot 2018-11-07 at 3.33.31 PM with P the period of the time series (365 days for yearly data, 7 days for weekly data, etc...) and a and b are models to be estimating.

The holiday component H(t) is modeled as a regression on the specific holiday dates (which would have to be provided separately—although we won't cover that here).

The L-BFGS algorithm is used to fit the model.

Facebook Prophet works great out of the box and is very intuitive, especially for non-specialists with no time series or data science training, but it has very rigid requirements in the way the data should be formatted. The time series data needs to be passed to the function as a data frame with a column 'ds' for date and 'y' for data.

So the first step in training a Prophet model will be to format the data properly:


#Create a data frame in the right format for Prophet
FBTrain <- data.frame(ds = as.Date(as.yearmon(time(train))), y=as.matrix(train))

To fit a Prophet model to the data:


#Generate a model and then generate forecasts 
m <- prophet(FBTrain,yearly.seasonality=TRUE)
Initial log joint probability = -2.56207
Optimization terminated normally: 
  Convergence detected: absolute parameter change was below tolerance

Prophet also requires that a future time data frame be defined, before generating forecasts:

future <- make_future_dataframe(m, periods = 24, freq = 'month')
forecast <- predict(m, future)
#Plot the components 
options(repr.plot.width=4, repr.plot.height=4)
prophet_plot_components(m,forecast)

decomposition7

Finally, you can plot the forecasts. Note that Prophet comes with its own plotting function, but we will be using autoplot from the forecast package to keep our plots consistent:

#Plot the results 
ProphetForecast <- ts(forecast$yhat,frequency=12, start=c(1949,1))
ProphetForecast <- window(ProphetForecast, start = c(1959, 1), end = c(1960,12))
options(repr.plot.width=8, repr.plot.height=4)
autoplot(train , ylab = 'Passengers') + scale_x_yearmon() + autolayer(test, series="Test Data") +
  autolayer(ProphetForecast, series="Prophet Forecasts")

decomposition8

Improving Accuracy

The forecasts that we have generated so far using these three approaches are "OK"—they seem to follow the general trend of the time series and they also replicated the seasonal pattern in the data. But they still seem to fail to capture the increase in the magnitude of the seasonal variation over time. This is because the air passengers time series has more of a multiplicative seasonal pattern than an additive one. In the next section, we will try to overcome this difficulty by applying a log transform to the data so that multiplicative seasonality can be represented by an additive model. We will rerun each of the forecasting methods on the log transformed data, and then reverse transform the resulting forecasts and compare them to the original data.

#Plot the log transformed data - you can see that the seasonality is more additive now. 
options(repr.plot.width=6, repr.plot.height=4)
plot(log10(train))

decomposition9

STL With Log Transformed Data

AirPassengersSTL_ARIMA_log <- stlf(log10(train),method="arima")
options(repr.plot.width=8, repr.plot.height=4)
autoplot(train , ylab = 'Passengers') + scale_x_yearmon() + autolayer(test, series="Test Data") +
  autolayer(ts(10^as.numeric(AirPassengersSTL_ARIMA_log$mean),frequency=12, start=c(1959,1)), series="STL + log Forecasts")

decomposition10

BSTS With Log Transformed Data

#Train the BSTS model 
Trend_Seasonal_states_log <- AddGeneralizedLocalLinearTrend(list(),log10(train))
Trend_Seasonal_states_log <- AddSeasonal(Trend_Seasonal_states_log,train,nseasons=12)
AirPassengersBSTS_log <- bsts(log10(train),state.specification = Trend_Seasonal_states_log, niter = 1000)
=-=-=-=-= Iteration 0 Mon Jul 09 12:06:46 2018
 =-=-=-=-=
=-=-=-=-= Iteration 100 Mon Jul 09 12:06:46 2018
 =-=-=-=-=
=-=-=-=-= Iteration 200 Mon Jul 09 12:06:46 2018
 =-=-=-=-=
=-=-=-=-= Iteration 300 Mon Jul 09 12:06:46 2018
 =-=-=-=-=
=-=-=-=-= Iteration 400 Mon Jul 09 12:06:47 2018
 =-=-=-=-=
=-=-=-=-= Iteration 500 Mon Jul 09 12:06:47 2018
 =-=-=-=-=
=-=-=-=-= Iteration 600 Mon Jul 09 12:06:47 2018
 =-=-=-=-=
=-=-=-=-= Iteration 700 Mon Jul 09 12:06:47 2018
 =-=-=-=-=
=-=-=-=-= Iteration 800 Mon Jul 09 12:06:47 2018
 =-=-=-=-=
=-=-=-=-= Iteration 900 Mon Jul 09 12:06:47 2018
 =-=-=-=-=
BSTSForecasts_log <- predict.bsts(AirPassengersBSTS_log, horizon = 24, quantiles = c(0.0000001, 0.000001))
options(repr.plot.width=8, repr.plot.height=4)
autoplot(train , ylab = 'Passengers') + scale_x_yearmon() + autolayer(test, series="Test Data") +
  autolayer(ts(10^as.numeric(BSTSForecasts_log$mean),frequency=12, start=c(1959,1)), series="BSTS + log Forecasts")

decomposition11

Prophet With Log Transformed Data

#Create a data frame in the right format for Prophet
FBTrain_log <- data.frame(ds = as.Date(as.yearmon(time(train))), y=as.matrix(log10(train)))

#Generate a model and then generate forecasts 
m <- prophet(FBTrain_log,yearly.seasonality=TRUE , seasonality.prior.scale = 2) 
future <- make_future_dataframe(m, periods = 24, freq = 'month')
forecast <- predict(m, future)
Initial log joint probability = -2.08326
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance

ProphetForecast <- ts(10^as.numeric(forecast$yhat),frequency=12, start=c(1949,1))
ProphetForecast <- window(ProphetForecast, start = c(1959, 1), end = c(1960,12))
options(repr.plot.width=8, repr.plot.height=4)
autoplot(train , ylab = 'Passengers') + scale_x_yearmon() + 
  autolayer(test, series="Test Data") + autolayer(ProphetForecast, 
  series="Prophet Forecasts")

decomposition12

In all three cases, we can see that using the log transform improved the accuracy of the forecasts for all three methods, although BSTS seems to perform better than the other two methods.

Some Notes

  • We didn't mention the question of stationarity, which is usually discussed when dealing with ARMA and ARIMA models. Stationarity is not discussed in decomposition-based methods, but it is dealt with implicitly by modeling the trend and seasonal components separately from the remainder series.

  • We didn't mention forecast intervals in this overview—we only generated point forecasts. In many business scenarios, such as demand forecasting and supply chain planning, the forecast intervals are just as important as the point forecasts. Each of the R functions used allows you to generate forecast intervals as well.

  • We informally evaluated the performance of the forecast methods based on plots of forecasts vs actuals for hold out test data. A more rigorous analysis of these methods' performance would require the calculation of forecast error metrics like MAPE and RMSE.

  • In the air passengers data set, there is only one yearly seasonal component. But several business time series can have multiple seasonalities. For example, a restaurant customer time series will have daily seasonality (with peaks at lunch time and dinner time) and weekly seasonality (less customers on weekends). STL cannot handle more than one seasonal component. Prophet can handle multiple seasonalities although, as mentioned above, it requires specific date formats. BSTS can handle multiple seasonalities by using a Fourier series (the same way as Prophet) for representing the seasonal component (call AddTrig instead of AddSeasonal).

  • STL in its basic form doesn't allow for holiday effects, although the version of STL that is available in the R package forecast can take in holidays as exogenous variables input into the forecast method that is used for modeling the remainder.

  • Prophet and BSTS can both handle holiday effects. In Prophet, this is explicit and all you need to do is provide a data frame with the holiday dates. In BSTS, you need to model the holidays as a set of external regressors.

  • BSTS' full potential is realized when we add additional data beyond the time series and holiday data. For example, for the air passenger data, we could add additional factors such as economic conditions, airline marketing data, number of internet queries for airplane trips, etc., and then Bayesian modeling can be used to improve the accuracy of the forecasts. In particular, BSTS is very useful for "Nowcasting"—predicting the values of time series in the present. In the air passenger data example, it would take several months for the airlines to compile and release the data, so you can't know the value for the current month until several months into the future. If, however, you needed to forecast the number of air passengers for the current month, and you couldn't afford to wait, you could use the above mentioned external data along with historical values of the air passenger traffic and "nowcast" the number of air passengers for the current month.

Conclusion

In this overview, we introduced three decomposition-based approaches to forecasting and showed how they can be used as simple, easily interpretable, but also robust methods for univariate time series forecasting.

References
 
  • Cleveland et al., "STL: A Seasonal-Trend Decomposition", 1990
  • Scott & Varian, "Predicting the present with Bayesian structural time series", 2013.
  • Taylor & Letham, "Forecasting at scale", 2017.
Skander Hannachi
Author
Skander Hannachi

Skander Hannachi is a data scientist working for Nordstrom's demand forecasting and replenishment team. His current interests are time series models and optimization algorithms applied to demand forecasting and inventory management. Prior to that, he worked in consulting roles, implementing retail forecasting and planning solutions for various retailers. He holds a Ph.D in computational intelligence and has done academic research on neural networks and fuzzy logic, both from a theoretical point of view, as well as applied to fisheries and environmental sciences.