
Contains a Generic class for curated DataModules/Datasets.

Inheriting subclasses are data-specific implementations that allow the user to import and download pre-converteddatasets for training of deep learning based methods in GraphNeT.

Bases: GraphNeTDataModule

Generic base class for curated datasets.

Curated Datasets in GraphNeT are pre-converted datasets that have been prepared for training and evaluation of deep learning models. On these Datasets, graphnet users can train and benchmark their models against SOTA methods.

Construct CuratedDataset.

  • graph_definition (GraphDefinition) – Method that defines the data representation.

  • download_dir (str) – Directory to download dataset to.

  • truth (Optional) – List of event-level truth to include. Will include all available information if not given.

  • features (Optional) – List of input features from pulsemap to use. If not given, all available features will be used.

  • backend (Optional) – data backend to use. Either “parquet” or “sqlite”. Defaults to “parquet”.

  • train_dataloader_kwargs (Optional) – Arguments for the training DataLoader. Default None.

  • validation_dataloader_kwargs (Optional) – Arguments for the validation DataLoader, Default None.

  • test_dataloader_kwargs (Optional) – Arguments for the test DataLoader. Default None.

abstract prepare_data()[source]

Download and prepare data.

Print details on the Dataset.

property pulsemaps: List[str]

Produce a list of available pulsemaps in Dataset.

property truth_table: List[str]

Produce name of table containing event-level truth in Dataset.

property event_truth: List[str]

Produce a list of available event-level truth in Dataset.

property pulse_truth: List[str] | None

Produce a list of available pulse-level truth in Dataset.

property features: List[str]

Produce a list of available input features in Dataset.

property experiment: str

Produce the name of the experiment that the data comes from.

property citation: str

Produce a string that describes how to cite this Dataset.

property comments: str

Produce comments on the dataset from the creator.

property creator: str

Produce name of person who created the Dataset.

property events: Dict[str, int]

Produce a dict that contains number events in each selection.

property available_backends: List[str]

Produce a list of available data formats that the data comes in.

property dataset_dir: str

Produce path directory that contains dataset files.

Bases: CuratedDataset

A base class for dataset/datamodule hosted at ERDA.

Inheriting subclasses will just need to fill out the _file_hashes attribute, which points to the file-id of a ERDA-hosted sharelink. It is assumed that sharelinks point to a single compressed file that has been compressed using tar with extension “.tar.gz”.

E.g. suppose that the sharelink below points to a compressed sqlite database. Then: _file_hashes = {‘sqlite’ : “FbEEzAbg5A”}

Construct CuratedDataset.

Prepare the dataset for training.

