Step-by-step guide on using the GARCH Model
with Statsforecast
.
GARCH
models. Besides, note the condition that the order . The GARCH
model in Definition 1 has the properties as follows.
Proposition 1. If is a process defined in (1)
and ,then the
following propositions hold.
Advantages | Disadvantages |
---|---|
1. 1. Flexible model: The GARCH model is flexible and can fit different types of time series data with different volatility patterns. | 1. Requires a large amount of data: The GARCH model requires a large amount of data to accurately estimate the model parameters. |
2. Ability to model volatility: The GARCH model is capable of modeling the volatility and heteroscedasticity of a time series, which can improve the accuracy of forecasts. | 2. Sensitive to the model specification: The GARCH model is sensitive to the model specification and can be difficult to estimate if incorrectly specified. |
3. It incorporates past information: The GARCH model incorporates past information on the volatility of the time series, which makes it useful for predicting future volatility. | 3. It can be computationally expensive: The GARCH model can be computationally expensive, especially if more complex models are used. |
4. Allows the inclusion of exogenous variables: The GARCH model can be extended to include exogenous variables, which can improve the accuracy of the predictions. | 4. It does not consider extreme events: The GARCH model does not consider extreme or unexpected events in the time series, which can affect the accuracy of the predictions in situations of high volatility. |
5. The GARCH model makes it possible to model conditional heteroscedasticity, that is, the variation of the variance of a time series as a function of time and of the previous values of the time series itself. | 5. The GARCH model assumes that the time series errors are normally distributed, which may not be true in practice. If the errors are not normally distributed, the model may produce inaccurate estimates of volatility. |
6. The GARCH model can be used to estimate the value at risk (VaR) and the conditional value at risk (CVaR) of an investment portfolio. |
Tip Statsforecast will be needed. To install, see instructions.Next, we import plotting libraries and configure the plotting style.
Price | Date | Adj Close | Close | High | Low | Open | Volume |
---|---|---|---|---|---|---|---|
Ticker | ^GSPC | ^GSPC | ^GSPC | ^GSPC | ^GSPC | ^GSPC | |
0 | 2015-01-02 00:00:00+00:00 | 2058.199951 | 2058.199951 | 2072.360107 | 2046.040039 | 2058.899902 | 2708700000 |
1 | 2015-01-05 00:00:00+00:00 | 2020.579956 | 2020.579956 | 2054.439941 | 2017.339966 | 2054.439941 | 3799120000 |
2 | 2015-01-06 00:00:00+00:00 | 2002.609985 | 2002.609985 | 2030.250000 | 1992.439941 | 2022.150024 | 4460110000 |
3 | 2015-01-07 00:00:00+00:00 | 2025.900024 | 2025.900024 | 2029.609985 | 2005.550049 | 2005.550049 | 3805480000 |
4 | 2015-01-08 00:00:00+00:00 | 2062.139893 | 2062.139893 | 2064.080078 | 2030.609985 | 2030.609985 | 3934010000 |
unique_id
(string, int or category) represents an identifier
for the series.
ds
(datestamp) column should be of a format expected by
Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a
timestamp.
y
(numeric) represents the measurement we wish to forecast.
ds | y | unique_id | |
---|---|---|---|
0 | 2015-01-02 00:00:00+00:00 | 2058.199951 | 1 |
1 | 2015-01-05 00:00:00+00:00 | 2020.579956 | 1 |
2 | 2015-01-06 00:00:00+00:00 | 2002.609985 | 1 |
3 | 2015-01-07 00:00:00+00:00 | 2025.900024 | 1 |
4 | 2015-01-08 00:00:00+00:00 | 2062.139893 | 1 |
Dickey Fuller
test
Augmented_Dickey_Fuller
test gives us a p-value
of 0.864700, which tells us that the null
hypothesis cannot be rejected, and on the other hand the data of our
series are not stationary.
We need to differentiate our time series, in order to convert the data
to stationary.
DataFrame.pct_change()
function. The pct_change()
function has a
periods parameter whose default value is 1. If you want to calculate a
30-day return, you must change the value to 30.
ds | y | unique_id | return | |
---|---|---|---|---|
1 | 2015-01-05 00:00:00+00:00 | 2020.579956 | 1 | -1.827811 |
2 | 2015-01-06 00:00:00+00:00 | 2002.609985 | 1 | -0.889347 |
3 | 2015-01-07 00:00:00+00:00 | 2025.900024 | 1 | 1.162984 |
4 | 2015-01-08 00:00:00+00:00 | 2062.139893 | 1 | 1.788828 |
5 | 2015-01-09 00:00:00+00:00 | 2044.810059 | 1 | -0.840381 |
ds | y | unique_id | return | sq_return | |
---|---|---|---|---|---|
1 | 2015-01-05 00:00:00+00:00 | 2020.579956 | 1 | -1.827811 | 3.340891 |
2 | 2015-01-06 00:00:00+00:00 | 2002.609985 | 1 | -0.889347 | 0.790938 |
3 | 2015-01-07 00:00:00+00:00 | 2025.900024 | 1 | 1.162984 | 1.352532 |
4 | 2015-01-08 00:00:00+00:00 | 2062.139893 | 1 | 1.788828 | 3.199906 |
5 | 2015-01-09 00:00:00+00:00 | 2044.810059 | 1 | -0.840381 | 0.706240 |
lb_stat | lb_pvalue | bp_stat | bp_pvalue | |
---|---|---|---|---|
1 | 49.222273 | 2.285409e-12 | 49.155183 | 2.364927e-12 |
2 | 62.991348 | 2.097020e-14 | 62.899234 | 2.195861e-14 |
3 | 63.944944 | 8.433622e-14 | 63.850663 | 8.834380e-14 |
4 | 74.343652 | 2.742989e-15 | 74.221024 | 2.911751e-15 |
5 | 80.234862 | 7.494100e-16 | 80.093498 | 8.022242e-16 |
GARCH
model 2. Data to test our model
For the test data we will use the last 30 day to test and evaluate the
performance of our model.
freq:
a string indicating the frequency of the data. (See pandas’
available
frequencies.)
n_jobs:
n_jobs: int, number of jobs used in the parallel
processing, use -1 for all cores.
fallback_model:
a model to be used if a model fails.
df:
training data frame
h (int):
represents h steps into the future that are being
forecasted. In this case, 12 months ahead.
step_size (int):
step size between each window. In other words:
how often do you want to run the forecasting processes.
n_windows(int):
number of windows used for cross validation. In
other words: what number of forecasting processes in the past do you
want to evaluate.
unique_id:
series identifierds:
datestamp or temporal indexcutoff:
the last datestamp or temporal index for the n_windows.y:
true value"model":
columns with the model’s name and fitted value.unique_id | ds | cutoff | y | GARCH(1,1) | GARCH(1,2) | GARCH(2,2) | GARCH(2,1) | GARCH(3,1) | GARCH(3,2) | GARCH(3,3) | GARCH(1,3) | GARCH(2,3) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2023-01-04 00:00:00+00:00 | 2023-01-03 00:00:00+00:00 | 0.753897 | 1.678755 | 1.678412 | 1.680475 | 1.686649 | 1.719494 | 2.210902 | 1.702743 | 1.647114 | 1.637795 |
1 | 1 | 2023-01-05 00:00:00+00:00 | 2023-01-03 00:00:00+00:00 | -1.164553 | -0.728069 | -0.745487 | -0.730648 | -0.722156 | -0.738119 | -0.824748 | -0.755277 | -0.740976 | -0.744150 |
2 | 1 | 2023-01-06 00:00:00+00:00 | 2023-01-03 00:00:00+00:00 | 2.284078 | -0.589733 | -0.582982 | -0.590078 | -0.598076 | -0.587109 | -0.866347 | -0.571160 | -0.587807 | -0.584692 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … |
387 | 1 | 2023-05-26 00:00:00+00:00 | 2023-02-07 00:00:00+00:00 | 1.304909 | -1.697814 | -1.694747 | -1.702537 | -1.735631 | -1.729903 | -1.712997 | -1.663399 | -1.702160 | -1.687723 |
388 | 1 | 2023-05-30 00:00:00+00:00 | 2023-02-07 00:00:00+00:00 | 0.001660 | -0.326945 | -0.337504 | -0.329686 | -0.330120 | -0.334717 | -0.327583 | -0.330260 | -0.338245 | -0.332412 |
389 | 1 | 2023-05-31 00:00:00+00:00 | 2023-02-07 00:00:00+00:00 | -0.610862 | 0.807625 | 0.787054 | 0.807819 | 0.841536 | 0.811702 | 0.836159 | 0.772193 | 0.801933 | 0.804526 |
metric | GARCH(1,1) | GARCH(1,2) | GARCH(2,2) | GARCH(2,1) | GARCH(3,1) | GARCH(3,2) | GARCH(3,3) | GARCH(1,3) | GARCH(2,3) | |
---|---|---|---|---|---|---|---|---|---|---|
0 | rmse | 1.383143 | 1.526258 | 1.481056 | 1.389969 | 1.453538 | 1.539906 | 1.392352 | 1.515796 | 1.389061 |
StatsForecast
,
and in particular the GARCH model and the parameters used in Cross
Validation to determine the best model for this example.
In the previous result it can be seen that the best model is the model
With this result found using Cross Validation to determine which is the
best model, we are going to continue training our model, to then make
the predictions.
.get()
function to extract the element and then we are going to save
it in a pd.DataFrame()
.
residual Model | |
---|---|
0 | NaN |
1 | -3.035736 |
2 | 1.927247 |
… | … |
2113 | 1.502385 |
2114 | -0.768274 |
2115 | -0.742694 |
StatsForecast.forecast
method instead of .fit
and .predict
.
The main difference is that the .forecast
doest not store the fitted
values and is highly scalable in distributed environments.
The forecast method takes two arguments: forecasts next h
(horizon)
and level
.
h (int):
represents the forecast h steps into the future. In this
case, 12 months ahead.
level (list of floats):
this optional parameter is used for
probabilistic forecasting. Set the level (or confidence percentile)
of your prediction interval. For example, level=[90]
means that
the model expects the real value to be inside that interval 90% of
the times.
ARIMA
and
Theta
)
unique_id | ds | GARCH(1,1) | |
---|---|---|---|
0 | 1 | 2023-06-01 00:00:00+00:00 | 1.366914 |
1 | 1 | 2023-06-02 00:00:00+00:00 | -0.593121 |
2 | 1 | 2023-06-05 00:00:00+00:00 | -0.485200 |
3 | 1 | 2023-06-06 00:00:00+00:00 | -0.927145 |
4 | 1 | 2023-06-07 00:00:00+00:00 | 0.766640 |
unique_id | ds | GARCH(1,1) | GARCH(1,1)-lo-95 | GARCH(1,1)-hi-95 | |
---|---|---|---|---|---|
0 | 1 | 2023-06-01 00:00:00+00:00 | 1.366914 | -0.021035 | 2.754863 |
1 | 1 | 2023-06-02 00:00:00+00:00 | -0.593121 | -2.435497 | 1.249254 |
2 | 1 | 2023-06-05 00:00:00+00:00 | -0.485200 | -2.139216 | 1.168815 |
3 | 1 | 2023-06-06 00:00:00+00:00 | -0.927145 | -2.390566 | 0.536276 |
4 | 1 | 2023-06-07 00:00:00+00:00 | 0.766640 | -0.771479 | 2.304759 |
unique_id | ds | y | GARCH(1,1) | GARCH(1,1)-lo-95 | GARCH(1,1)-hi-95 | |
---|---|---|---|---|---|---|
0 | 1 | 2015-01-05 00:00:00+00:00 | -1.827811 | NaN | NaN | NaN |
1 | 1 | 2015-01-06 00:00:00+00:00 | -0.889347 | 2.146389 | -0.972874 | 5.265652 |
2 | 1 | 2015-01-07 00:00:00+00:00 | 1.162984 | -0.764263 | -3.883526 | 2.355000 |
3 | 1 | 2015-01-08 00:00:00+00:00 | 1.788828 | -0.650707 | -3.769970 | 2.468556 |
4 | 1 | 2015-01-09 00:00:00+00:00 | -0.840381 | -1.449049 | -4.568312 | 1.670214 |
unique_id | ds | GARCH(1,1) | GARCH(1,1)-lo-95 | GARCH(1,1)-hi-95 | |
---|---|---|---|---|---|
0 | 1 | 2023-06-01 00:00:00+00:00 | 1.366914 | -0.021035 | 2.754863 |
1 | 1 | 2023-06-02 00:00:00+00:00 | -0.593121 | -2.435497 | 1.249254 |
2 | 1 | 2023-06-05 00:00:00+00:00 | -0.485200 | -2.139216 | 1.168815 |
… | … | … | … | … | … |
75 | 1 | 2023-09-14 00:00:00+00:00 | -1.686546 | -3.049859 | -0.323233 |
76 | 1 | 2023-09-15 00:00:00+00:00 | -0.322556 | -2.497448 | 1.852335 |
77 | 1 | 2023-09-18 00:00:00+00:00 | 0.799407 | -1.027642 | 2.626457 |
h
(for
horizon) and level
.
h (int):
represents the forecast h steps into the future. In this
case, 30 dayly ahead.
level (list of floats):
this optional parameter is used for
probabilistic forecasting. Set the level (or confidence percentile)
of your prediction interval. For example, level=[95]
means that
the model expects the real value to be inside that interval 95% of
the times.
unique_id | ds | GARCH(1,1) | |
---|---|---|---|
0 | 1 | 2023-06-01 00:00:00+00:00 | 1.366914 |
1 | 1 | 2023-06-02 00:00:00+00:00 | -0.593121 |
2 | 1 | 2023-06-05 00:00:00+00:00 | -0.485200 |
… | … | … | … |
75 | 1 | 2023-09-14 00:00:00+00:00 | -1.686546 |
76 | 1 | 2023-09-15 00:00:00+00:00 | -0.322556 |
77 | 1 | 2023-09-18 00:00:00+00:00 | 0.799407 |
unique_id | ds | GARCH(1,1) | GARCH(1,1)-lo-95 | GARCH(1,1)-lo-80 | GARCH(1,1)-hi-80 | GARCH(1,1)-hi-95 | |
---|---|---|---|---|---|---|---|
0 | 1 | 2023-06-01 00:00:00+00:00 | 1.366914 | -0.021035 | 0.459383 | 2.274445 | 2.754863 |
1 | 1 | 2023-06-02 00:00:00+00:00 | -0.593121 | -2.435497 | -1.797786 | 0.611543 | 1.249254 |
2 | 1 | 2023-06-05 00:00:00+00:00 | -0.485200 | -2.139216 | -1.566703 | 0.596303 | 1.168815 |
… | … | … | … | … | … | … | … |
7 | 1 | 2023-06-12 00:00:00+00:00 | -1.051435 | -4.790880 | -3.496526 | 1.393657 | 2.688010 |
8 | 1 | 2023-06-13 00:00:00+00:00 | 0.421605 | -3.001123 | -1.816396 | 2.659607 | 3.844333 |
9 | 1 | 2023-06-14 00:00:00+00:00 | -0.300086 | -3.138338 | -2.155920 | 1.555747 | 2.538166 |
unique_id | metric | GARCH(1,1) | |
---|---|---|---|
0 | 1 | mae | 0.843296 |
1 | 1 | mape | 3.703305 |
2 | 1 | mase | 0.794905 |
3 | 1 | rmse | 1.048076 |
4 | 1 | smape | 0.709150 |