graphnet.data.curated_datamodule module

Contains a Generic class for curated DataModules/Datasets.

Inheriting subclasses are data-specific implementations that allow the user to import and download pre-converteddatasets for training of deep learning based methods in GraphNeT.

class graphnet.data.curated_datamodule.CuratedDataset(graph_definition, download_dir, truth, features, backend, train_dataloader_kwargs, validation_dataloader_kwargs, test_dataloader_kwargs)[source]

Bases: GraphNeTDataModule

Generic base class for curated datasets.

Curated Datasets in GraphNeT are pre-converted datasets that have been prepared for training and evaluation of deep learning models. On these Datasets, graphnet users can train and benchmark their models against SOTA methods.

Construct CuratedDataset.

Parameters:
  • graph_definition (GraphDefinition) – Method that defines the data representation.

  • download_dir (str) – Directory to download dataset to.

  • truth (Optional) – List of event-level truth to include. Will include all available information if not given.

  • features (Optional) – List of input features from pulsemap to use. If not given, all available features will be used.

  • backend (Optional) – data backend to use. Either “parquet” or “sqlite”. Defaults to “parquet”.

  • train_dataloader_kwargs (Optional) – Arguments for the training DataLoader. Default None.

  • validation_dataloader_kwargs (Optional) – Arguments for the validation DataLoader, Default None.

  • test_dataloader_kwargs (Optional) – Arguments for the test DataLoader. Default None.

abstract prepare_data()[source]

Download and prepare data.

Return type:

None

description()[source]

Print details on the Dataset.

Return type:

None

property pulsemaps: List[str]

Produce a list of available pulsemaps in Dataset.

property truth_table: List[str]

Produce name of table containing event-level truth in Dataset.

property event_truth: List[str]

Produce a list of available event-level truth in Dataset.

property pulse_truth: List[str] | None

Produce a list of available pulse-level truth in Dataset.

property features: List[str]

Produce a list of available input features in Dataset.

property experiment: str

Produce the name of the experiment that the data comes from.

property citation: str

Produce a string that describes how to cite this Dataset.

property comments: str

Produce comments on the dataset from the creator.

property creator: str

Produce name of person who created the Dataset.

property events: Dict[str, int]

Produce a dict that contains number events in each selection.

property available_backends: List[str]

Produce a list of available data formats that the data comes in.

property dataset_dir: str

Produce path directory that contains dataset files.

class graphnet.data.curated_datamodule.ERDAHostedDataset(graph_definition, download_dir, truth, features, backend, train_dataloader_kwargs, validation_dataloader_kwargs, test_dataloader_kwargs)[source]

Bases: CuratedDataset

A base class for dataset/datamodule hosted at ERDA.

Inheriting subclasses will just need to fill out the _file_hashes attribute, which points to the file-id of a ERDA-hosted sharelink. It is assumed that sharelinks point to a single compressed file that has been compressed using tar with extension “.tar.gz”.

E.g. suppose that the sharelink below https://sid.erda.dk/share_redirect/FbEEzAbg5A points to a compressed sqlite database. Then: _file_hashes = {‘sqlite’ : “FbEEzAbg5A”}

Construct CuratedDataset.

Parameters:
  • graph_definition (GraphDefinition) – Method that defines the data representation.

  • download_dir (str) – Directory to download dataset to.

  • truth (Optional) – List of event-level truth to include. Will include all available information if not given.

  • features (Optional) – List of input features from pulsemap to use. If not given, all available features will be used.

  • backend (Optional) – data backend to use. Either “parquet” or “sqlite”. Defaults to “parquet”.

  • train_dataloader_kwargs (Optional) – Arguments for the training DataLoader. Default None.

  • validation_dataloader_kwargs (Optional) – Arguments for the validation DataLoader, Default None.

  • test_dataloader_kwargs (Optional) – Arguments for the test DataLoader. Default None.

prepare_data()[source]

Prepare the dataset for training.

Return type:

None