Motivation
The
AutoARIMA
model is widely used to forecast time series in production and as a
benchmark. However, the python implementation (pmdarima
) is so slow
that prevent data scientist practioners from quickly iterating and
deploying
AutoARIMA
in production for a large number of time series. In this notebook we
present Nixtla’s
AutoARIMA
based on the R implementation (developed by Rob Hyndman) and optimized
using numba
.
Example
Libraries
Useful functions
Data
For testing purposes, we will use the Hourly dataset from the M4
competition.
In this example we will use a subset of the data to avoid waiting too
long. You can modify the number of series if you want.
Would an autorregresive model be the right choice for our data? There is
no doubt that we observe seasonal periods. The autocorrelation function
(acf
) can help us to answer the question. Intuitively, we have to
observe a decreasing correlation to opt for an AR model.
Thus, we observe a high autocorrelation for previous lags and also for
the seasonal lags. Therefore, we will let auto_arima
to handle our
data.
Training and forecasting
StatsForecast
receives a list of models to fit each time series. Since we are dealing
with Hourly data, it would be benefitial to use 24 as seasonality.
Init signature:
AutoARIMA(
d: Optional[int] = None,
D: Optional[int] = None,
max_p: int = 5,
max_q: int = 5,
max_P: int = 2,
max_Q: int = 2,
max_order: int = 5,
max_d: int = 2,
max_D: int = 1,
start_p: int = 2,
start_q: int = 2,
start_P: int = 1,
start_Q: int = 1,
stationary: bool = False,
seasonal: bool = True,
ic: str = 'aicc',
stepwise: bool = True,
nmodels: int = 94,
trace: bool = False,
approximation: Optional[bool] = False,
method: Optional[str] = None,
truncate: Optional[bool] = None,
test: str = 'kpss',
test_kwargs: Optional[str] = None,
seasonal_test: str = 'seas',
seasonal_test_kwargs: Optional[Dict] = None,
allowdrift: bool = False,
allowmean: bool = False,
blambda: Optional[float] = None,
biasadj: bool = False,
season_length: int = 1,
alias: str = 'AutoARIMA',
prediction_intervals: Optional[statsforecast.utils.ConformalIntervals] = None,
)
Docstring:
AutoARIMA model.
Automatically selects the best ARIMA (AutoRegressive Integrated Moving Average)
model using an information criterion. Default is Akaike Information Criterion (AICc).
**Note:**<br/>
This implementation is a mirror of Hyndman's [forecast::auto.arima](https://github.com/robjhyndman/forecast).
**References:**<br/>
[Rob J. Hyndman, Yeasmin Khandakar (2008). "Automatic Time Series Forecasting: The forecast package for R"](https://www.jstatsoft.org/article/view/v027i03).
Parameters
----------
d : Optional[int]
Order of first-differencing.
D : Optional[int]
Order of seasonal-differencing.
max_p : int
Max autorregresives p.
max_q : int
Max moving averages q.
max_P : int
Max seasonal autorregresives P.
max_Q : int
Max seasonal moving averages Q.
max_order : int
Max p+q+P+Q value if not stepwise selection.
max_d : int
Max non-seasonal differences.
max_D : int
Max seasonal differences.
start_p : int
Starting value of p in stepwise procedure.
start_q : int
Starting value of q in stepwise procedure.
start_P : int
Starting value of P in stepwise procedure.
start_Q : int
Starting value of Q in stepwise procedure.
stationary : bool
If True, restricts search to stationary models.
seasonal : bool
If False, restricts search to non-seasonal models.
ic : str
Information criterion to be used in model selection.
stepwise : bool
If True, will do stepwise selection (faster).
nmodels : int
Number of models considered in stepwise search.
trace : bool
If True, the searched ARIMA models is reported.
approximation : Optional[bool]
If True, conditional sums-of-squares estimation, final MLE.
method : Optional[str]
Fitting method between maximum likelihood or sums-of-squares.
truncate : Optional[int]
Observations truncated series used in model selection.
test : str
Unit root test to use. See `ndiffs` for details.
test_kwargs : Optional[str]
Unit root test additional arguments.
seasonal_test : str
Selection method for seasonal differences.
seasonal_test_kwargs : Optional[dict]
Seasonal unit root test arguments.
allowdrift : bool (default True)
If True, drift models terms considered.
allowmean : bool (default True)
If True, non-zero mean models considered.
blambda : Optional[float]
Box-Cox transformation parameter.
biasadj : bool
Use adjusted back-transformed mean Box-Cox.
season_length : int
Number of observations per unit of time. Ex: 24 Hourly data.
alias : str
Custom name of the model.
prediction_intervals : Optional[ConformalIntervals]
Information to compute conformal prediction intervals.
By default, the model will compute the native prediction
intervals.
File: /hdd/github/statsforecast/statsforecast/models.py
Type: type
Subclasses:
As we see, we can pass season_length
to
AutoARIMA
,
so the definition of our models would be,
| ds | AutoARIMA |
---|
unique_id | | |
H1 | 701 | 616.084167 |
H1 | 702 | 544.432129 |
H1 | 703 | 510.414490 |
H1 | 704 | 481.046539 |
H1 | 705 | 460.893066 |
Alternatives
pmdarima
You can use the
StatsForecast
class to parallelize your own models. In this section we will use it to
run the auto_arima
model from pmdarima
.
| ds | pmdarima |
---|
unique_id | | |
H1 | 701 | 628.310547 |
H1 | 702 | 571.659851 |
H1 | 703 | 543.504700 |
H1 | 704 | 517.539062 |
H1 | 705 | 502.829559 |
Prophet
Prophet
is designed to receive a pandas dataframe, so we cannot use
StatForecast
. Therefore, we need to parallize from scratch.
| unique_id | ds | prophet |
---|
0 | H1 | 701 | 635.914254 |
1 | H1 | 702 | 565.976464 |
2 | H1 | 703 | 505.095507 |
3 | H1 | 704 | 462.559539 |
4 | H1 | 705 | 438.766801 |
… | … | … | … |
43 | H112 | 744 | 6184.686240 |
44 | H112 | 745 | 6188.851888 |
45 | H112 | 746 | 6129.306256 |
46 | H112 | 747 | 6058.040672 |
47 | H112 | 748 | 5991.982370 |
Evaluation
Time
Since
AutoARIMA
works with numba is useful to calculate the time for just one time
series.
| pmdarima | prophet | AutoARIMA_nixtla |
---|
n_series | | | |
410 | 181686.758059 | 3093.636144 | 573.128222 |
411 | 182129.896494 | 3101.181598 | 574.480358 |
412 | 182573.034928 | 3108.727052 | 575.832494 |
413 | 183016.173362 | 3116.272506 | 577.184630 |
414 | 183459.311796 | 3123.817960 | 578.536766 |
pmdarima (only two time series)
| model | mae |
---|
0 | AutoARIMA | 20.289669 |
1 | pmdarima | 24.676279 |
2 | prophet | 39.201933 |
Prophet
| model | mae |
---|
0 | AutoARIMA | 680.202965 |
1 | prophet | 1058.578963 |
For a complete comparison check the complete
experiment.