AutoTheta Model
Step-by-step guide on using the AutoTheta Model
with Statsforecast
.
Table of Contents
- Introduction
- Theta Models
- Loading libraries and data
- Explore data with the plot method
- Split the data into training and testing
- Implementation of AutoTheta with StatsForecast
- Cross-validation
- Model evaluation
- References
Introduction
The development of accurate, robust and reliable forecasting methods for univariate time series is very important when large numbers of time series are involved in the modelling and forecasting process. In industrial settings, it is very common to work with large lines of products; thus, efficient sales and operational planning (S&OP) depend heavily on accurate forecasting methods.
The Theta method (Assimakopoulos & Nikolopoulos, 2000, hereafter A&N) is applied to non-seasonal or deseasonalised time series, where the deseasonalisation is usually performed via the multiplicative classical decomposition. The method decomposes the original time series into two new lines through the so-called theta coefficients, denoted by and for , which are applied to the second difference of the data. The second differences are reduced when , resulting in a better approximation of the long-term behaviour of the series (Assimakopoulos, 1995). If is equal to zero, the new line is a straight line. When the local curvatures are increased, magnifying the short-term movements of the time series (A&N). The new lines produced are called theta lines, denoted here by and . These lines have the same mean value and slope as the original data, but the local curvatures are either filtered out or enhanced, depending on the value of the coefficient.
In other words, the decomposition process has the advantage of exploiting information in the data that usually cannot be captured and modelled completely through the extrapolation of the original time series. The theta lines can be regarded as new time series and are extrapolated separately using an appropriate forecasting method. Once the extrapolation of each theta line has been completed, recomposition takes place through a combination scheme in order to calculate the point forecasts of the original time series. Combining has long been considered as a useful practice in the forecasting literature (for example, Clemen, 1989, Makridakis and Winkler, 1983, Petropoulos et al., 2014), and therefore its application to the Theta method is expected to result in more accurate and robust forecasts.
The Theta method is quite versatile in terms of choosing the number of theta lines, the theta coefficients and the extrapolation methods, and combining these to obtain robust forecasts. However, A&N proposed a simplified version involving the use of only two theta lines with prefixed coefficients that are extrapolated over time using a linear regression (LR) model for the theta line with and simple exponential smoothing (SES) for the theta line with . The final forecasts are produced by combining the forecasts of the two theta lines with equal weights.
The performance of the Theta method has also been confirmed by other empirical studies (for example Nikolopoulos et al., 2012, Petropoulos and Nikolopoulos, 2013). Moreover, Hyndman and Billah (2003), hereafter H&B, showed that the simple exponential smoothing with drift model (SES-d) is a statistical model for the simplified version of the Theta method. More recently, Thomakos and Nikolopoulos (2014) provided additional theoretical insights, while Thomakos and Nikolopoulos (2015) derived new theoretical formulations for the application of the method to multivariate time series, and investigated the conditions under which the bivariate Theta method is expected to forecast better than the univariate one. Despite these advances, we believe that the Theta method deserves more attention from the forecasting community, given its simplicity and superior forecasting performance.
One key aspect of the Theta method is that, by definition, it is dynamic. One can choose different theta lines and combine the produced forecasts using either equal or unequal weights. However, AN limit this important property by fixing the theta coefficients to have predefined values.
The contributions of this work are fourfold. First, we extend the A&N method by the optimal selection of the theta line that describes the short-term movements of the series best, maintaining the long-term component. The forecasts derived from the two theta lines are combined using appropriate weights, which ensures the recomposition of the original time series. Second, we provide theoretical and practical links between the newly proposed model, the original Theta method and the SES-d model. Third, we also perform a further extension of the model that allows the regression line (the long term component) to be revised at every time period.
Theta method
The original Theta method
Originally, AN proposed the theta line as the solution of the equation
where is the original time series (non-seasonal or deseasonalised) and is the difference operator (i.e., ). The initial values of and are obtained by minimising . However, an analytical solution to compute the was obtained by H&B, which is given by
where and are the minimum square coefficients of a simple linear regression over against $1, n $, given by
From this point of view, the theta lines can be interpreted as functions of the linear regression model applied to the data directly. However, note that and are only functions of the original data, not parameters of the Theta method.
Finally, the forecasts produced by the Theta method for steps ahead of are an ad-hoc combination (50%-50%) of the extrapolations of and by the linear regression model and the simple exponential smoothing model respectively. We will refer to the above setup as the standard Theta method (STheta).
The steps for building the STheta method of AN are as follows: 1. Deseasonalisation: The time series is tested for statistically significant seasonal behaviour. A time series is seasonal if
where denotes the lag autocorrelation function, is the number of the periods within a seasonal cycle (for example, 12 for monthly data), is the sample size, is the quantile function of the standard normal distribution, and is the confidence level. A&N opted for a 90% confidence level. If the time series is identified as seasonal, then it is deseasonalised via the classical decomposition method, assuming the seasonal component to have a multiplicative relationship.
-
Decomposition: The seasonally adjusted time series is decomposed into two theta lines, the linear regression line and the theta line .
-
Extrapolation: is extrapolated as a normal linear regression line, while is extrapolated using SES.
-
Combination: The final forecast is a combination of the forecasts of the two theta lines using equal weights.
-
Reseasonalisation: If the series was identified as seasonal in step 1, then the final forecasts are multiplied by the respective seasonal indices.
Models for optimising the Theta method
Assume that either the time series is non-seasonal or it has been seasonally adjusted using the multiplicative classical decomposition approach.
Let be the linear combination of two theta lines,
where is the weight parameter. Assuming that and , the weight can be derived as
It is straightforward to see from Eqs. (4), (5) that i.e., the weights are calculated properly in such a way that Eq. (4) reproduces the original series. In Theorem 1 of Appendix A , we prove that the solution is unique and that the error from not choosing the optimal weights ( and ) s proportional to the error of a linear regression model. As a consequence, the STheta method is given simply by setting and while from Eq. (5) we get . Thus, Eqs. (4), (5) allow us to construct a generalisation of the Theta model that maintains the re-composition propriety of the original time series for any theta lines and .
In order to maintain the modelling of the long-term component and retain a fair comparison with the STheta method, in this work we fix and focus on the optimisation of the short-term component, with . Thus, is the only parameter that requires estimation so far. The theta decomposition is now given by
The -step-ahead forecasts calculated at origin are given by
where is the extrapolation of by an SES model with as the initial level parameter and as the smoothing parameter. Note that for Eq. (6) corresponds to Step 4 of the STheta algorithm. After some algebra, we can write
where for and .
In the light of Eqs. (6), (7), we suggest four stochastic approaches. These approaches differ due to the parameter which may be either fixed at two or optimised, and the coefficients and , which can be either fixed or dynamic functions. To formulate the state space models, it is helpful to adopt as the one-step-ahead forecast at origin and as the respective additive error, i.e., if . We assume to be a Gaussian white noise process with mean zero and variance .
Optimised and standard Theta models
Let and be fixed coefficients for all so that Eqs. (6), (7) configure the state space model given by
with parameters , and . The parameter is to be estimated along with and We call this the optimised Theta model (OTM).
The -step-ahead forecast at origin is given by
which is equivalent to Eq. (6). The conditional variance can be computed easily from the state space model. Thus, the prediction interval for is given by
For OTM reproduces the forecasts of the STheta method; hereafter, we will refer to this particular case as the standard Theta model (STM). In Theorem 2 of Appendix A, we show that OTM is mathematically equivalent to the SES-d model. As a corollary of Theorem 2, STM is mathematically equivalent to SES-d with . Therefore, for the corollary also re-confirms the H&B result on the relationship between STheta and the SES-d model.
Loading libraries and data
Tip
Statsforecast will be needed. To install, see instructions.
Next, we import plotting libraries and configure the plotting style.
Read Data
observation_date | IPG3113N | |
---|---|---|
0 | 1972-01-01 | 85.6945 |
1 | 1972-02-01 | 71.8200 |
2 | 1972-03-01 | 66.0229 |
3 | 1972-04-01 | 64.5645 |
4 | 1972-05-01 | 65.0100 |
The input to StatsForecast is always a data frame in long format with three columns: unique_id, ds and y:
-
The
unique_id
(string, int or category) represents an identifier for the series. -
The
ds
(datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. -
The
y
(numeric) represents the measurement we wish to forecast.
ds | y | unique_id | |
---|---|---|---|
0 | 1972-01-01 | 85.6945 | 1 |
1 | 1972-02-01 | 71.8200 | 1 |
2 | 1972-03-01 | 66.0229 | 1 |
3 | 1972-04-01 | 64.5645 | 1 |
4 | 1972-05-01 | 65.0100 | 1 |
We can see that our time variable (ds)
is in an object format, we need
to convert to a date format
Explore Data with the plot method
Plot some series using the plot method from the StatsForecast class. This method prints aa random series from the dataset and is useful for basic EDA.
Autocorrelation plots
Split the data into training and testing
Let’s divide our data into sets 1. Data to train our
AutoTheta
model 2. Data to test our model
For the test data we will use the last 12 months to test and evaluate the performance of our model.
Now let’s plot the training data and the test data.
Implementation of AutoTheta with StatsForecast
Load libraries
Instantiate Model
Import and instantiate the models. Setting the argument is sometimes tricky. This article on Seasonal periods by the master, Rob Hyndmann, can be useful.season_length.
Automatically selects the best Theta (Standard Theta Model (‘STM’)
,
Optimized Theta Model (‘OTM’)
, Dynamic Standard Theta Model
(‘DSTM’)
, Dynamic Optimized Theta Model (‘DOTM’))
model using mse.
We fit the models by instantiating a new StatsForecast object with the following parameters:
models: a list of models. Select the models you want from models and import them.
-
freq:
a string indicating the frequency of the data. (See panda’s available frequencies.) -
n_jobs:
n_jobs: int, number of jobs used in the parallel processing, use -1 for all cores. -
fallback_model:
a model to be used if a model fails.
Any settings are passed into the constructor. Then you call its fit method and pass in the historical data frame.
Fit Model
Let’s see the results of our Theta model. We can observe it with the following instruction:
Let us now visualize the residuals of our models.
As we can see, the result obtained above has an output in a dictionary,
to extract each element from the dictionary we are going to use the
.get()
function to extract the element and then we are going to save
it in a pd.DataFrame()
.
residual Model | |
---|---|
0 | 2.108105 |
1 | -11.389412 |
2 | -9.491397 |
… | … |
533 | -12.076382 |
534 | -10.073893 |
535 | -9.392075 |
Forecast Method
If you want to gain speed in productive settings where you have multiple
series or models we recommend using the
StatsForecast.forecast
method instead of .fit
and .predict
.
The main difference is that the .forecast
doest not store the fitted
values and is highly scalable in distributed environments.
The forecast method takes two arguments: forecasts next h
(horizon)
and level
.
-
h (int):
represents the forecast h steps into the future. In this case, 12 months ahead. -
level (list of floats):
this optional parameter is used for probabilistic forecasting. Set the level (or confidence percentile) of your prediction interval. For example,level=[90]
means that the model expects the real value to be inside that interval 90% of the times.
The forecast object here is a new data frame that includes a column with
the name of the model and the y hat values, as well as columns for the
uncertainty intervals. Depending on your computer, this step should take
around 1min. (If you want to speed things up to a couple of seconds,
remove the AutoModels like
ARIMA
and
Theta
)
unique_id | ds | AutoTheta | |
---|---|---|---|
0 | 1 | 2016-09-01 | 111.075916 |
1 | 1 | 2016-10-01 | 129.111289 |
2 | 1 | 2016-11-01 | 131.296087 |
… | … | … | … |
9 | 1 | 2017-06-01 | 101.125752 |
10 | 1 | 2017-07-01 | 99.870514 |
11 | 1 | 2017-08-01 | 106.021681 |
unique_id | ds | y | AutoTheta | |
---|---|---|---|---|
0 | 1 | 1972-01-01 | 85.6945 | 83.586395 |
1 | 1 | 1972-02-01 | 71.8200 | 83.209412 |
2 | 1 | 1972-03-01 | 66.0229 | 75.514297 |
3 | 1 | 1972-04-01 | 64.5645 | 74.384911 |
4 | 1 | 1972-05-01 | 65.0100 | 76.336319 |
Adding 95% confidence interval with the forecast method
unique_id | ds | AutoTheta | AutoTheta-lo-95 | AutoTheta-hi-95 | |
---|---|---|---|---|---|
0 | 1 | 2016-09-01 | 111.075916 | 90.148822 | 135.999675 |
1 | 1 | 2016-10-01 | 129.111289 | 94.811133 | 160.372802 |
2 | 1 | 2016-11-01 | 131.296087 | 90.598455 | 168.251600 |
… | … | … | … | … | … |
9 | 1 | 2017-06-01 | 101.125752 | 41.213688 | 159.133373 |
10 | 1 | 2017-07-01 | 99.870514 | 35.173966 | 152.843041 |
11 | 1 | 2017-08-01 | 106.021681 | 38.784242 | 166.021093 |
ds | y | unique_id | AutoTheta | |
---|---|---|---|---|
0 | 2016-09-01 | 109.3191 | 1 | 111.075916 |
1 | 2016-10-01 | 119.0502 | 1 | 129.111289 |
2 | 2016-11-01 | 116.8431 | 1 | 131.296087 |
… | … | … | … | … |
9 | 2017-06-01 | 104.2022 | 1 | 101.125752 |
10 | 2017-07-01 | 102.5861 | 1 | 99.870514 |
11 | 2017-08-01 | 114.0613 | 1 | 106.021681 |
Predict method with confidence interval
To generate forecasts use the predict method.
The predict method takes two arguments: forecasts the next h
(for
horizon) and level
.
-
h (int):
represents the forecast h steps into the future. In this case, 12 months ahead. -
level (list of floats):
this optional parameter is used for probabilistic forecasting. Set the level (or confidence percentile) of your prediction interval. For example,level=[95]
means that the model expects the real value to be inside that interval 95% of the times.
The forecast object here is a new data frame that includes a column with the name of the model and the y hat values, as well as columns for the uncertainty intervals.
This step should take less than 1 second.
unique_id | ds | AutoTheta | |
---|---|---|---|
0 | 1 | 2016-09-01 | 111.075916 |
1 | 1 | 2016-10-01 | 129.111289 |
2 | 1 | 2016-11-01 | 131.296087 |
… | … | … | … |
9 | 1 | 2017-06-01 | 101.125752 |
10 | 1 | 2017-07-01 | 99.870514 |
11 | 1 | 2017-08-01 | 106.021681 |
unique_id | ds | AutoTheta | AutoTheta-lo-95 | AutoTheta-hi-95 | |
---|---|---|---|---|---|
0 | 1 | 2016-09-01 | 111.075916 | 90.148822 | 135.999675 |
1 | 1 | 2016-10-01 | 129.111289 | 94.811133 | 160.372802 |
2 | 1 | 2016-11-01 | 131.296087 | 90.598455 | 168.251600 |
… | … | … | … | … | … |
9 | 1 | 2017-06-01 | 101.125752 | 41.213688 | 159.133373 |
10 | 1 | 2017-07-01 | 99.870514 | 35.173966 | 152.843041 |
11 | 1 | 2017-08-01 | 106.021681 | 38.784242 | 166.021093 |
Cross-validation
In previous steps, we’ve taken our historical data to predict the future. However, to asses its accuracy we would also like to know how the model would have performed in the past. To assess the accuracy and robustness of your models on your data perform Cross-Validation.
With time series data, Cross Validation is done by defining a sliding window across the historical data and predicting the period following it. This form of cross-validation allows us to arrive at a better estimation of our model’s predictive abilities across a wider range of temporal instances while also keeping the data in the training set contiguous as is required by our models.
The following graph depicts such a Cross Validation Strategy:
Perform time series cross-validation
Cross-validation of time series models is considered a best practice but most implementations are very slow. The statsforecast library implements cross-validation as a distributed operation, making the process less time-consuming to perform. If you have big datasets you can also perform Cross Validation in a distributed cluster using Ray, Dask or Spark.
In this case, we want to evaluate the performance of each model for the
last 5 months (n_windows=5)
, forecasting every second months
(step_size=12)
. Depending on your computer, this step should take
around 1 min.
The cross_validation method from the StatsForecast class takes the following arguments.
-
df:
training data frame -
h (int):
represents h steps into the future that are being forecasted. In this case, 12 months ahead. -
step_size (int):
step size between each window. In other words: how often do you want to run the forecasting processes. -
n_windows(int):
number of windows used for cross validation. In other words: what number of forecasting processes in the past do you want to evaluate.
The crossvaldation_df object is a new data frame that includes the following columns:
unique_id:
series identifier.ds:
datestamp or temporal indexcutoff:
the last datestamp or temporal index for the n_windows.y:
true value"model":
columns with the model’s name and fitted value.
unique_id | ds | cutoff | y | AutoTheta | |
---|---|---|---|---|---|
0 | 1 | 2011-09-01 | 2011-08-01 | 93.9062 | 98.167467 |
1 | 1 | 2011-10-01 | 2011-08-01 | 116.7634 | 116.969931 |
2 | 1 | 2011-11-01 | 2011-08-01 | 116.8258 | 119.135143 |
… | … | … | … | … | … |
57 | 1 | 2016-06-01 | 2015-08-01 | 102.4044 | 109.600458 |
58 | 1 | 2016-07-01 | 2015-08-01 | 102.9512 | 108.260148 |
59 | 1 | 2016-08-01 | 2015-08-01 | 104.6977 | 114.248257 |
Model Evaluation
Now we are going to evaluate our model with the results of the predictions, we will use different types of metrics MAE, MAPE, MASE, RMSE, SMAPE to evaluate the accuracy.
unique_id | metric | AutoTheta | |
---|---|---|---|
0 | 1 | mae | 6.281526 |
1 | 1 | mape | 0.055684 |
2 | 1 | mase | 1.212475 |
3 | 1 | rmse | 7.683674 |
4 | 1 | smape | 0.027399 |
Acknowledgements
We would like to thank Naren Castellon for writing this tutorial.
References
- Jose A. Fiorucci, Tiago R. Pellegrini, Francisco Louzada, Fotios Petropoulos, Anne B. Koehler (2016). “Models for optimising the theta method and their relationship to state space models”. International Journal of Forecasting.
- Nixtla Parameters.
- Pandas available frequencies.
- Rob J. Hyndman and George Athanasopoulos (2018). “Forecasting principles and practice, Time series cross-validation”..
- Seasonal periods- Rob J Hyndman.