PyTorch Dataset/Loader

TimeSeriesLoader

 TimeSeriesLoader (dataset, **kwargs)

*TimeSeriesLoader DataLoader. Source code. Small change to PyTorch’s Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset. The class ~torch.utils.data.DataLoader supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching (collation) and memory pinning. Parameters:
batch_size: (int, optional): how many samples per batch to load (default: 1).
shuffle: (bool, optional): set to True to have the data reshuffled at every epoch (default: False).
sampler: (Sampler or Iterable, optional): defines the strategy to draw samples from the dataset.
Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified.
*

source

BaseTimeSeriesDataset

 BaseTimeSeriesDataset (temporal_cols, max_size:int, min_size:int,
                        y_idx:int, static=None, static_cols=None)

*An abstract class representing a :class:Dataset. All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples. .. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.*

source

LocalFilesTimeSeriesDataset

 LocalFilesTimeSeriesDataset (files_ds:List[str], temporal_cols,
                              id_col:str, time_col:str, target_col:str,
                              last_times, indices, max_size:int,
                              min_size:int, y_idx:int, static=None,
                              static_cols=None)

source

TimeSeriesDataset

 TimeSeriesDataset (temporal, temporal_cols, indptr, y_idx:int,
                    static=None, static_cols=None)

source

TimeSeriesDataModule

 TimeSeriesDataModule (dataset:__main__.BaseTimeSeriesDataset,
                       batch_size=32, valid_batch_size=1024,
                       drop_last=False, shuffle_train=True,
                       **dataloaders_kwargs)

*A DataModule standardizes the training, val, test splits, data preparation and transforms. The main advantage is consistent data splits, data preparation and transforms across models. Example::

import lightning.pytorch as L
import torch.utils.data as data
from pytorch_lightning.demos.boring_classes import RandomDataset

class MyDataModule(L.LightningDataModule):
    def prepare_data(self):
        # download, IO, etc. Useful with shared filesystems
        # only called on 1 GPU/TPU in distributed
        ...

    def setup(self, stage):
        # make assignments here (val/train/test split)
        # called on every process in DDP
        dataset = RandomDataset(1, 100)
        self.train, self.val, self.test = data.random_split(
            dataset, [80, 10, 10], generator=torch.Generator().manual_seed(42)
        )

    def train_dataloader(self):
        return data.DataLoader(self.train)

    def val_dataloader(self):
        return data.DataLoader(self.val)

    def test_dataloader(self):
        return data.DataLoader(self.test)

    def on_exception(self, exception):
        # clean up state after the trainer faced an exception
        ...

    def teardown(self):
        # clean up state after the trainer stops, delete files...
        # called on every process in DDP
        ...*

# To test correct future_df wrangling of the `update_df` method
# We are checking that we are able to recover the AirPassengers dataset
# using the dataframe or splitting it into parts and initializing.

Getting Started

Capabilities

Tutorials

Use cases

API Reference

PyTorch Dataset/Loader

TimeSeriesLoader

BaseTimeSeriesDataset

LocalFilesTimeSeriesDataset

TimeSeriesDataset

TimeSeriesDataModule

Getting Started

Capabilities

Tutorials

Use cases

API Reference

​TimeSeriesLoader

​BaseTimeSeriesDataset

​LocalFilesTimeSeriesDataset

​TimeSeriesDataset

​TimeSeriesDataModule

TimeSeriesLoader

BaseTimeSeriesDataset

LocalFilesTimeSeriesDataset

TimeSeriesDataset

TimeSeriesDataModule