Step-by-step guide on using the ARCH Model
with Statsforecast
.
ARCH
.
As we have known, there are lot of time series that possess the ARCH
effect, that is, although the (modeling residual) series is white noise,
its squared series may be autocorrelated. What is more, in practice, a
large number of financial time series are found having this property so
that the ARCH effect has become one of the stylized facts from financial
time series.
Advantages | Disadvantages |
---|---|
- The ARCH model is useful for modeling volatility in financial time series, which is important for investment decision making and risk management. | - The ARCH model assumes that the forecast errors are independent and identically distributed, which may not be realistic in some cases. |
- The ARCH model takes heteroscedasticity into account, which means that it can model time series with variances that change over time. | - The ARCH model can be difficult to fit to data with many parameters, which may require large amounts of data or advanced estimation techniques. |
- The ARCH model is relatively easy to use and can be implemented with standard econometrics software. | - The ARCH model does not take into account the possible relationship between the mean and the variance of the time series, which may be important in some cases. |
Tip Statsforecast will be needed. To install, see instructions.Next, we import plotting libraries and configure the plotting style.
Price | Date | Adj Close | Close | High | Low | Open | Volume |
---|---|---|---|---|---|---|---|
Ticker | ^GSPC | ^GSPC | ^GSPC | ^GSPC | ^GSPC | ^GSPC | |
0 | 2015-01-02 00:00:00+00:00 | 2058.199951 | 2058.199951 | 2072.360107 | 2046.040039 | 2058.899902 | 2708700000 |
1 | 2015-01-05 00:00:00+00:00 | 2020.579956 | 2020.579956 | 2054.439941 | 2017.339966 | 2054.439941 | 3799120000 |
2 | 2015-01-06 00:00:00+00:00 | 2002.609985 | 2002.609985 | 2030.250000 | 1992.439941 | 2022.150024 | 4460110000 |
3 | 2015-01-07 00:00:00+00:00 | 2025.900024 | 2025.900024 | 2029.609985 | 2005.550049 | 2005.550049 | 3805480000 |
4 | 2015-01-08 00:00:00+00:00 | 2062.139893 | 2062.139893 | 2064.080078 | 2030.609985 | 2030.609985 | 3934010000 |
unique_id
(string, int or category) represents an identifier
for the series.
ds
(datestamp) column should be of a format expected by
Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a
timestamp.
y
(numeric) represents the measurement we wish to forecast.
ds | y | unique_id | |
---|---|---|---|
0 | 2015-01-02 00:00:00+00:00 | 2058.199951 | 1 |
1 | 2015-01-05 00:00:00+00:00 | 2020.579956 | 1 |
2 | 2015-01-06 00:00:00+00:00 | 2002.609985 | 1 |
3 | 2015-01-07 00:00:00+00:00 | 2025.900024 | 1 |
4 | 2015-01-08 00:00:00+00:00 | 2062.139893 | 1 |
Dickey Fuller
test
Augmented_Dickey_Fuller
test gives us a p-value
of 0.864700, which tells us that the null
hypothesis cannot be rejected, and on the other hand the data of our
series are not stationary.
We need to differentiate our time series, in order to convert the data
to stationary.
DataFrame.pct_change()
function. The pct_change()
function has a
periods parameter whose default value is 1. If you want to calculate a
30-day return, you must change the value to 30.
ds | y | unique_id | return | |
---|---|---|---|---|
1 | 2015-01-05 00:00:00+00:00 | 2020.579956 | 1 | -1.827811 |
2 | 2015-01-06 00:00:00+00:00 | 2002.609985 | 1 | -0.889347 |
3 | 2015-01-07 00:00:00+00:00 | 2025.900024 | 1 | 1.162984 |
4 | 2015-01-08 00:00:00+00:00 | 2062.139893 | 1 | 1.788828 |
5 | 2015-01-09 00:00:00+00:00 | 2044.810059 | 1 | -0.840381 |
ds | y | unique_id | return | sq_return | |
---|---|---|---|---|---|
1 | 2015-01-05 00:00:00+00:00 | 2020.579956 | 1 | -1.827811 | 3.340891 |
2 | 2015-01-06 00:00:00+00:00 | 2002.609985 | 1 | -0.889347 | 0.790938 |
3 | 2015-01-07 00:00:00+00:00 | 2025.900024 | 1 | 1.162984 | 1.352532 |
4 | 2015-01-08 00:00:00+00:00 | 2062.139893 | 1 | 1.788828 | 3.199906 |
5 | 2015-01-09 00:00:00+00:00 | 2044.810059 | 1 | -0.840381 | 0.706240 |
lb_stat | lb_pvalue | bp_stat | bp_pvalue | |
---|---|---|---|---|
1 | 49.222273 | 2.285409e-12 | 49.155183 | 2.364927e-12 |
2 | 62.991348 | 2.097020e-14 | 62.899234 | 2.195861e-14 |
3 | 63.944944 | 8.433622e-14 | 63.850663 | 8.834380e-14 |
4 | 74.343652 | 2.742989e-15 | 74.221024 | 2.911751e-15 |
5 | 80.234862 | 7.494100e-16 | 80.093498 | 8.022242e-16 |
ARCH
modelARCH Model
, they are listed below. For more information, visit the
documentation
freq:
a string indicating the frequency of the data. (See pandas’
available
frequencies.)
n_jobs:
n_jobs: int, number of jobs used in the parallel
processing, use -1 for all cores.
fallback_model:
a model to be used if a model fails.
.get()
function to extract the element and then we are going to save
it in a pd.DataFrame()
.
residual Model | |
---|---|
0 | NaN |
1 | NaN |
2 | -1.071764 |
… | … |
2109 | 1.495836 |
2110 | -2.222393 |
2111 | 0.248642 |
StatsForecast.forecast
method instead of .fit
and .predict
.
The main difference is that the .forecast
doest not store the fitted
values and is highly scalable in distributed environments.
The forecast method takes two arguments: forecasts next h
(horizon)
and level
.
h (int):
represents the forecast h steps into the future. In this
case, 12 months ahead.
level (list of floats):
this optional parameter is used for
probabilistic forecasting. Set the level (or confidence percentile)
of your prediction interval. For example, level=[90]
means that
the model expects the real value to be inside that interval 90% of
the times.
unique_id | ds | ARCH(2) | |
---|---|---|---|
0 | 1 | 2023-05-25 00:00:00+00:00 | 1.681839 |
1 | 1 | 2023-05-26 00:00:00+00:00 | -0.777029 |
2 | 1 | 2023-05-29 00:00:00+00:00 | -0.677962 |
… | … | … | … |
79 | 1 | 2023-09-13 00:00:00+00:00 | 0.695591 |
80 | 1 | 2023-09-14 00:00:00+00:00 | -0.176075 |
81 | 1 | 2023-09-15 00:00:00+00:00 | -0.158605 |
unique_id | ds | y | ARCH(2) | |
---|---|---|---|---|
0 | 1 | 2015-01-05 00:00:00+00:00 | -1.827811 | NaN |
1 | 1 | 2015-01-06 00:00:00+00:00 | -0.889347 | NaN |
2 | 1 | 2015-01-07 00:00:00+00:00 | 1.162984 | 2.234748 |
3 | 1 | 2015-01-08 00:00:00+00:00 | 1.788828 | -0.667577 |
4 | 1 | 2015-01-09 00:00:00+00:00 | -0.840381 | -0.752438 |
unique_id | ds | ARCH(2) | ARCH(2)-lo-95 | ARCH(2)-hi-95 | |
---|---|---|---|---|---|
0 | 1 | 2023-05-25 00:00:00+00:00 | 1.681839 | -0.419326 | 3.783003 |
1 | 1 | 2023-05-26 00:00:00+00:00 | -0.777029 | -3.939054 | 2.384996 |
2 | 1 | 2023-05-29 00:00:00+00:00 | -0.677962 | -3.907262 | 2.551338 |
… | … | … | … | … | … |
79 | 1 | 2023-09-13 00:00:00+00:00 | 0.695591 | -0.937585 | 2.328766 |
80 | 1 | 2023-09-14 00:00:00+00:00 | -0.176075 | -1.405359 | 1.053210 |
81 | 1 | 2023-09-15 00:00:00+00:00 | -0.158605 | -1.381915 | 1.064705 |
ds | unique_id | y | ARCH(2) | |
---|---|---|---|---|
0 | 2023-05-25 00:00:00+00:00 | 1 | 0.875758 | 1.681839 |
1 | 2023-05-26 00:00:00+00:00 | 1 | 1.304909 | -0.777029 |
2 | 2023-05-30 00:00:00+00:00 | 1 | 0.001660 | -0.968703 |
… | … | … | … | … |
79 | 2023-09-19 00:00:00+00:00 | 1 | -0.215101 | NaN |
80 | 2023-09-20 00:00:00+00:00 | 1 | -0.939479 | NaN |
81 | 2023-09-21 00:00:00+00:00 | 1 | -1.640093 | NaN |
h
(for
horizon) and level
.
h (int):
represents the forecast h steps into the future. In this
case, 12 months ahead.
level (list of floats):
this optional parameter is used for
probabilistic forecasting. Set the level (or confidence percentile)
of your prediction interval. For example, level=[95]
means that
the model expects the real value to be inside that interval 95% of
the times.
unique_id | ds | ARCH(2) | |
---|---|---|---|
0 | 1 | 2023-05-25 00:00:00+00:00 | 1.681839 |
1 | 1 | 2023-05-26 00:00:00+00:00 | -0.777029 |
2 | 1 | 2023-05-29 00:00:00+00:00 | -0.677962 |
… | … | … | … |
79 | 1 | 2023-09-13 00:00:00+00:00 | 0.695591 |
80 | 1 | 2023-09-14 00:00:00+00:00 | -0.176075 |
81 | 1 | 2023-09-15 00:00:00+00:00 | -0.158605 |
unique_id | ds | ARCH(2) | ARCH(2)-lo-95 | ARCH(2)-lo-80 | ARCH(2)-hi-80 | ARCH(2)-hi-95 | |
---|---|---|---|---|---|---|---|
0 | 1 | 2023-05-25 00:00:00+00:00 | 1.681839 | -0.419326 | 0.307961 | 3.055716 | 3.783003 |
1 | 1 | 2023-05-26 00:00:00+00:00 | -0.777029 | -3.939054 | -2.844566 | 1.290508 | 2.384996 |
2 | 1 | 2023-05-29 00:00:00+00:00 | -0.677962 | -3.907262 | -2.789488 | 1.433564 | 2.551338 |
… | … | … | … | … | … | … | … |
79 | 1 | 2023-09-13 00:00:00+00:00 | 0.695591 | -0.937585 | -0.372285 | 1.763467 | 2.328766 |
80 | 1 | 2023-09-14 00:00:00+00:00 | -0.176075 | -1.405359 | -0.979860 | 0.627711 | 1.053210 |
81 | 1 | 2023-09-15 00:00:00+00:00 | -0.158605 | -1.381915 | -0.958485 | 0.641274 | 1.064705 |
pd.concat()
, and then be able to use this result for
graphing.
unique_id | y | ARCH(2) | ARCH(2)-lo-95 | ARCH(2)-lo-80 | ARCH(2)-hi-80 | ARCH(2)-hi-95 | |
---|---|---|---|---|---|---|---|
ds | |||||||
2023-03-07 00:00:00+00:00 | 1 | -1.532692 | NaN | NaN | NaN | NaN | NaN |
2023-03-08 00:00:00+00:00 | 1 | 0.141479 | NaN | NaN | NaN | NaN | NaN |
2023-03-09 00:00:00+00:00 | 1 | -1.845936 | NaN | NaN | NaN | NaN | NaN |
… | … | … | … | … | … | … | … |
2023-09-13 00:00:00+00:00 | 1 | NaN | 0.695591 | -0.937585 | -0.372285 | 1.763467 | 2.328766 |
2023-09-14 00:00:00+00:00 | 1 | NaN | -0.176075 | -1.405359 | -0.979860 | 0.627711 | 1.053210 |
2023-09-15 00:00:00+00:00 | 1 | NaN | -0.158605 | -1.381915 | -0.958485 | 0.641274 | 1.064705 |
Statsforecast
, as shown below.
(n_windows=5)
, forecasting every second months
(step_size=12)
. Depending on your computer, this step should take
around 1 min.
The cross_validation method from the StatsForecast class takes the
following arguments.
df:
training data frame
h (int):
represents h steps into the future that are being
forecasted. In this case, 12 months ahead.
step_size (int):
step size between each window. In other words:
how often do you want to run the forecasting processes.
n_windows(int):
number of windows used for cross validation. In
other words: what number of forecasting processes in the past do you
want to evaluate.
unique_id:
series identifierds:
datestamp or temporal indexcutoff:
the last datestamp or temporal index for the n_windows.y:
true value"model":
columns with the model’s name and fitted value.unique_id | ds | cutoff | y | ARCH(2) | |
---|---|---|---|---|---|
0 | 1 | 2022-12-21 00:00:00+00:00 | 2022-12-20 00:00:00+00:00 | 1.486799 | 1.382105 |
1 | 1 | 2022-12-22 00:00:00+00:00 | 2022-12-20 00:00:00+00:00 | -1.445170 | -0.651618 |
2 | 1 | 2022-12-23 00:00:00+00:00 | 2022-12-20 00:00:00+00:00 | 0.586810 | -0.595213 |
… | … | … | … | … | … |
407 | 1 | 2023-05-22 00:00:00+00:00 | 2023-01-26 00:00:00+00:00 | 0.015503 | 0.693070 |
408 | 1 | 2023-05-23 00:00:00+00:00 | 2023-01-26 00:00:00+00:00 | -1.122203 | -0.176181 |
409 | 1 | 2023-05-24 00:00:00+00:00 | 2023-01-26 00:00:00+00:00 | -0.731860 | -0.157522 |
metric | ARCH(2) | |
---|---|---|
0 | mae | 0.949721 |
1 | mape | 11.789856 |
2 | mase | 0.875298 |
3 | rmse | 1.164914 |
4 | smape | 0.725702 |