Anomaly detection in time series data plays a pivotal role in numerous sectors including finance, healthcare, security, and infrastructure. In essence, time series data represents a sequence of data points indexed (or listed or graphed) in time order, often with equal intervals. As systems and processes become increasingly digitized and interconnected, the need to monitor and ensure their normal behavior grows proportionally. Detecting anomalies can indicate potential problems, malfunctions, or even malicious activities. By promptly identifying these deviations from the expected pattern, organizations can take preemptive measures, optimize processes, or protect resources. TimeGPT includes the detect_anomalies method to detect anomalies automatically.

import pandas as pd
from nixtlats import NixtlaClient
nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

The detect_anomalies method is designed to process a dataframe containing series and subsequently label each observation based on its anomalous nature. The method evaluates each observation of the input dataframe against its context within the series, using statistical measures to determine its likelihood of being an anomaly. By default, the method identifies anomalies based on a 99 percent prediction interval. Observations that fall outside this interval are considered anomalies. The resultant dataframe will feature an added label, anomaly, that is set to 1 for anomalous observations and 0 otherwise.

pm_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/peyton_manning.csv')
timegpt_anomalies_df = nixtla_client.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D')
timegpt_anomalies_df.head()
INFO:nixtlats.nixtla_client:Validating inputs...
INFO:nixtlats.nixtla_client:Preprocessing dataframes...
INFO:nixtlats.nixtla_client:Calling Anomaly Detector Endpoint...
timestampanomalyTimeGPT-lo-99TimeGPTTimeGPT-hi-99
02008-01-1006.9360098.2241949.512378
12008-01-1106.8633368.1515219.439705
22008-01-1206.8390648.1272499.415433
32008-01-1307.6290728.91725610.205441
42008-01-1407.7141119.00229510.290480
nixtla_client.plot(pm_df, 
             timegpt_anomalies_df,
             time_col='timestamp', 
             target_col='value')

While the default behavior of the detect_anomalies method is to operate using a 99 percent prediction interval, users have the flexibility to adjust this threshold to their requirements. This is achieved by modifying the level argument. Decreasing the value of the level argument will result in a narrower prediction interval, subsequently identifying more observations as anomalies. See the next example.

timegpt_anomalies_df = nixtla_client.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D', level=90)
nixtla_client.plot(pm_df, 
             timegpt_anomalies_df,
             time_col='timestamp', 
             target_col='value')
INFO:nixtlats.nixtla_client:Validating inputs...
INFO:nixtlats.nixtla_client:Preprocessing dataframes...
INFO:nixtlats.nixtla_client:Calling Anomaly Detector Endpoint...

Conversely, increasing the value will make prediction intervals larger, detecting fewer anomalies. This customization allows users to calibrate the sensitivity of the method to align with their specific use case, ensuring the most relevant and actionable insights are derived from the data.

timegpt_anomalies_df = nixtla_client.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D', level=99.99)
nixtla_client.plot(pm_df, 
             timegpt_anomalies_df,
             time_col='timestamp', 
             target_col='value')
INFO:nixtlats.nixtla_client:Validating inputs...
INFO:nixtlats.nixtla_client:Preprocessing dataframes...
INFO:nixtlats.nixtla_client:Calling Anomaly Detector Endpoint...

You can also include date_features to better detect anomalies:

timegpt_anomalies_df_x = nixtla_client.detect_anomalies(
    pm_df, time_col='timestamp', 
    target_col='value', 
    freq='D', 
    date_features=True,
    level=99.99,
)
nixtla_client.plot(
    pm_df, 
    timegpt_anomalies_df_x,
    time_col='timestamp', 
    target_col='value',
)
INFO:nixtlats.nixtla_client:Validating inputs...
INFO:nixtlats.nixtla_client:Preprocessing dataframes...
INFO:nixtlats.nixtla_client:Calling Anomaly Detector Endpoint...
INFO:nixtlats.nixtla_client:Using the following exogenous variables: year_2007, year_2008, year_2009, year_2010, year_2011, year_2012, year_2013, year_2014, year_2015, year_2016, month_1, month_2, month_3, month_4, month_5, month_6, month_7, month_8, month_9, month_10, month_11, month_12, day_1, day_2, day_3, day_4, day_5, day_6, day_7, day_8, day_9, day_10, day_11, day_12, day_13, day_14, day_15, day_16, day_17, day_18, day_19, day_20, day_21, day_22, day_23, day_24, day_25, day_26, day_27, day_28, day_29, day_30, day_31, weekday_0, weekday_1, weekday_2, weekday_3, weekday_4, weekday_5, weekday_6

Exogenous variables

Additionally you can pass exogenous variables to better inform TimeGPT about the data. You just simply have to add the exogenous regressors after the target column.

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df.head()
unique_iddsyExogenous1Exogenous2day_0day_1day_2day_3day_4day_5day_6
0BE2016-12-01 00:00:0072.0061507.071066.00.00.00.01.00.00.00.0
1BE2016-12-01 01:00:0065.8059528.067311.00.00.00.01.00.00.00.0
2BE2016-12-01 02:00:0059.9958812.067470.00.00.00.01.00.00.00.0
3BE2016-12-01 03:00:0050.6957676.064529.00.00.00.01.00.00.00.0
4BE2016-12-01 04:00:0052.5856804.062773.00.00.00.01.00.00.00.0

Now let’s compute anomalies considering this information

timegpt_anomalies_df_x = nixtla_client.detect_anomalies(df=df)
nixtla_client.plot(
    df, 
    timegpt_anomalies_df_x,
)
INFO:nixtlats.nixtla_client:Validating inputs...
INFO:nixtlats.nixtla_client:Preprocessing dataframes...
INFO:nixtlats.nixtla_client:Inferred freq: H
INFO:nixtlats.nixtla_client:Calling Anomaly Detector Endpoint...
INFO:nixtlats.nixtla_client:Using the following exogenous variables: Exogenous1, Exogenous2, day_0, day_1, day_2, day_3, day_4, day_5, day_6

We can also explore the relative importance of each of the features.

nixtla_client.weights_x.plot.barh(x='features', y='weights')

You can also add special days for different countries:

from nixtlats.date_features import CountryHolidays
timegpt_anomalies_df_x = nixtla_client.detect_anomalies(
    df=df,
    date_features=[CountryHolidays(countries=['FR'])]
)
nixtla_client.plot(
    df, 
    timegpt_anomalies_df_x,
)
INFO:nixtlats.nixtla_client:Validating inputs...
INFO:nixtlats.nixtla_client:Preprocessing dataframes...
INFO:nixtlats.nixtla_client:Inferred freq: H
INFO:nixtlats.nixtla_client:Calling Anomaly Detector Endpoint...
INFO:nixtlats.nixtla_client:Using the following exogenous variables: Exogenous1, Exogenous2, day_0, day_1, day_2, day_3, day_4, day_5, day_6, FR_Jour de l'an, FR_Fête du Travail, FR_Fête de la Victoire, FR_Fête nationale, FR_Armistice, FR_Lundi de Pâques, FR_Lundi de Pentecôte, FR_Ascension, FR_Assomption, FR_Toussaint, FR_Noël

nixtla_client.weights_x.plot.barh(x='features', y='weights')