In this notebook, we’ll implement models for intermittent or sparse dataIntermittent or sparse data has very few non-zero observations. This type of data is hard to forecast because the zero values increase the uncertainty about the underlying patterns in the data. Furthermore, once a non-zero observation occurs, there can be considerable variation in its size. Intermittent time series are common in many industries, including finance, retail, transportation, and energy. Given the ubiquity of this type of series, special methods have been developed to forecast them. The first was from Croston (1972), followed by several variants and by different aggregation frameworks. StatsForecast has implemented several models to forecast intermittent time series. By the end of this tutorial, you’ll have a good understanding of these models and how to use them. Outline:
Tip You can use Colab to run this Notebook interactively![]()
Tip For forecasting at scale, we recommend you check this notebook done on Databricks.
pip install statsforecast
plot_series
function from
utilsforecast.plotting
. This function has multiple parameters, and the
required ones to generate the plots in this notebook are explained
below.
df
: A pandas
dataframe with columns [unique_id
, ds
, y
].forecasts_df
: A pandas
dataframe with columns [unique_id
,
ds
] and models.plot_random
: Plots the time series randomly.max_insample_length
: The maximum number of train/insample
observations to be plotted.engine
: The library used to generate the plots. It can also be
matplotlib
for static plots.max_insample_length
. From these plots, we
can confirm that the data is indeed intermittent since it has multiple
periods with zero sales. In fact, in all cases but one, the median value
is zero.
statsforecast.models
and then we need to instantiate them.
models
: The list of models defined in the previous step.freq
: A string indicating the frequency of the data. See pandas’
available
frequencies.n_jobs
: An integer that indicates the number of jobs used in
parallel processing. Use -1 to select all cores.forecast
method, which requires the forecasting horizon (in this case,
28 days) as argument.
The models for intermittent series that are currently available in
StatsForecast can only generate point-forecasts. If prediction intervals
are needed, then a probabilisitic
model should be used.
unique_id | ds | ADIDA | CrostonClassic | IMAPA | TSB | |
---|---|---|---|---|---|---|
0 | FOODS_1_001_CA_1 | 2016-05-23 | 0.791852 | 0.898247 | 0.705835 | 0.434313 |
1 | FOODS_1_001_CA_1 | 2016-05-24 | 0.791852 | 0.898247 | 0.705835 | 0.434313 |
2 | FOODS_1_001_CA_1 | 2016-05-25 | 0.791852 | 0.898247 | 0.705835 | 0.434313 |
3 | FOODS_1_001_CA_1 | 2016-05-26 | 0.791852 | 0.898247 | 0.705835 | 0.434313 |
4 | FOODS_1_001_CA_1 | 2016-05-27 | 0.791852 | 0.898247 | 0.705835 | 0.434313 |
plot_series
function described above.
metric | ADIDA | CrostonClassic | IMAPA | TSB | |
---|---|---|---|---|---|
0 | mae | 0.948729 | 0.944071 | 0.957256 | 1.023126 |