curated_datamodule¶
Contains a Generic class for curated DataModules/Datasets.
Inheriting subclasses are data-specific implementations that allow the user to import and download pre-converteddatasets for training of deep learning based methods in GraphNeT.
- class graphnet.data.curated_datamodule.CuratedDataset(graph_definition, download_dir, truth, features, backend, train_dataloader_kwargs, validation_dataloader_kwargs, test_dataloader_kwargs)[source]¶
Bases:
GraphNeTDataModule
Generic base class for curated datasets.
Curated Datasets in GraphNeT are pre-converted datasets that have been prepared for training and evaluation of deep learning models. On these Datasets, graphnet users can train and benchmark their models against SOTA methods.
Construct CuratedDataset.
- Parameters:
graph_definition (
GraphDefinition
) – Method that defines the data representation.download_dir (
str
) – Directory to download dataset to.truth (Optional) – List of event-level truth to include. Will include all available information if not given.
features (Optional) – List of input features from pulsemap to use. If not given, all available features will be used.
backend (Optional) – data backend to use. Either “parquet” or “sqlite”. Defaults to “parquet”.
train_dataloader_kwargs (Optional) – Arguments for the training DataLoader. Default None.
validation_dataloader_kwargs (Optional) – Arguments for the validation DataLoader, Default None.
test_dataloader_kwargs (Optional) – Arguments for the test DataLoader. Default None.
- property pulsemaps: List[str]¶
Produce a list of available pulsemaps in Dataset.
- property truth_table: List[str]¶
Produce name of table containing event-level truth in Dataset.
- property event_truth: List[str]¶
Produce a list of available event-level truth in Dataset.
- property pulse_truth: List[str] | None¶
Produce a list of available pulse-level truth in Dataset.
- property features: List[str]¶
Produce a list of available input features in Dataset.
- property experiment: str¶
Produce the name of the experiment that the data comes from.
- property citation: str¶
Produce a string that describes how to cite this Dataset.
- property comments: str¶
Produce comments on the dataset from the creator.
- property creator: str¶
Produce name of person who created the Dataset.
- property events: Dict[str, int]¶
Produce a dict that contains number events in each selection.
- property available_backends: List[str]¶
Produce a list of available data formats that the data comes in.
- property dataset_dir: str¶
Produce path directory that contains dataset files.
- class graphnet.data.curated_datamodule.ERDAHostedDataset(graph_definition, download_dir, truth, features, backend, train_dataloader_kwargs, validation_dataloader_kwargs, test_dataloader_kwargs)[source]¶
Bases:
CuratedDataset
A base class for dataset/datamodule hosted at ERDA.
Inheriting subclasses will just need to fill out the _file_hashes attribute, which points to the file-id of a ERDA-hosted sharelink. It is assumed that sharelinks point to a single compressed file that has been compressed using tar with extension “.tar.gz”.
E.g. suppose that the sharelink below https://sid.erda.dk/share_redirect/FbEEzAbg5A points to a compressed sqlite database. Then: _file_hashes = {‘sqlite’ : “FbEEzAbg5A”}
Construct CuratedDataset.
- Parameters:
graph_definition (
GraphDefinition
) – Method that defines the data representation.download_dir (
str
) – Directory to download dataset to.truth (Optional) – List of event-level truth to include. Will include all available information if not given.
features (Optional) – List of input features from pulsemap to use. If not given, all available features will be used.
backend (Optional) – data backend to use. Either “parquet” or “sqlite”. Defaults to “parquet”.
train_dataloader_kwargs (Optional) – Arguments for the training DataLoader. Default None.
validation_dataloader_kwargs (Optional) – Arguments for the validation DataLoader, Default None.
test_dataloader_kwargs (Optional) – Arguments for the test DataLoader. Default None.