Tutorial on how to produce insample predictions.
predict_insample
function of the core
class to produce forecasts of the train and
validation sets. In this example we will train the
NHITS
model on the AirPassengers data, and show how to recover the insample
predictions after model is fitted.
Predict Insample: The process of producing forecasts of the train and
validation sets.
Use Cases: * Debugging: producing insample predictions is useful for
debugging purposes. For example, to check if the model is able to fit
the train set. * Training convergence: check if the the model has
converged. * Anomaly detection: insample predictions can be used to
detect anomalous behavior in the train set (e.g. outliers). (Note: if a
model is too flexible it might be able to perfectly forecast outliers)
You can run these experiments using GPU with Google Colab.
core.NeuralForecast
class contains shared, fit
, predict
and
other methods that take as inputs pandas DataFrames with columns
['unique_id', 'ds', 'y']
, where unique_id
identifies individual time
series from the dataset, ds
is the date, and y
is the target
variable.
In this example dataset consists of a set of a single series, but you
can easily fit your model to larger datasets in long format.
unique_id | ds | y | trend | y_[lag12] | |
---|---|---|---|---|---|
0 | Airline1 | 1949-01-31 | 112.0 | 0 | 112.0 |
1 | Airline1 | 1949-02-28 | 118.0 | 1 | 118.0 |
2 | Airline1 | 1949-03-31 | 132.0 | 2 | 132.0 |
3 | Airline1 | 1949-04-30 | 129.0 | 3 | 129.0 |
4 | Airline1 | 1949-05-31 | 121.0 | 4 | 121.0 |
NHITS
models on the AirPassengers data. We will use the fit
method of the
core
class to train the models.
NeuralForecast.predict_insample
method you can obtain the forecasts for the train and validation sets
after the models are fitted. The function will always take the last
dataset used for training in either the fit
or cross_validation
methods.
With the step_size
parameter you can specify the step size between
consecutive windows to produce the forecasts. In this example we will
set step_size=horizon
to produce non-overlapping forecasts.
The following diagram shows how the forecasts are produced based on the
step_size
parameter and h
(horizon) of the model. In the diagram we
set step_size=2
and h=4
.
predict_insample
function returns a pandas DataFrame with the
following columns: * unique_id
: the unique identifier of the time
series. * ds
: the datestamp of the forecast for each row. *
cutoff
: the datestamp at which the forecast was made. * y
: the
actual value of the target variable. * model_name
: the forecasted
values for the models. In this case,
NHITS
.
unique_id | ds | cutoff | NHITS | y | |
---|---|---|---|---|---|
0 | Airline1 | 1949-01-31 | 1948-12-31 | 0.064625 | 112.0 |
1 | Airline1 | 1949-02-28 | 1948-12-31 | 0.074300 | 118.0 |
2 | Airline1 | 1949-03-31 | 1948-12-31 | 0.133020 | 132.0 |
3 | Airline1 | 1949-04-30 | 1948-12-31 | 0.221040 | 129.0 |
4 | Airline1 | 1949-05-31 | 1948-12-31 | 0.176580 | 121.0 |
Important The function will produce forecasts from the first timestamp of the time series. For these initial timestamps, the forecasts might not be accurate given that models have very limited input information to produce forecasts.
predict_insample
function.
Note that the following settings are not yet supported: - Prediction
intervals on insample predictions on models trained with conformal
prediction intervals (e.g. a model trained with MAE and conformal
prediction intervals); - Prediction intervals on insample predictions on
multivariate models (e.g. a TSMixer model).