nubench_datasets¶

Curated datasets from the NuBench benchmark suite (arXiv:2511.13111).

class graphnet.datasets.nubench_datasets.NuBenchSpec(erda_hash, detector_cls, experiment, comments, features=<factory>, event_truth=<factory>, db_relpath/merged.db', selection_relpaths=<factory>, pulsemap_per_split=<factory>)[source]¶

Bases: object

Static configuration for a single NuBench dataset.

Parameters:

erda_hash (str)
detector_cls (Type[NuBenchDetector])
experiment (str)
comments (str)
features (List[str])
event_truth (List[str])
db_relpath (str)
selection_relpaths (Dict[str, str])
pulsemap_per_split (Dict[str, str])

erda_hash: str¶

detector_cls: Type[NuBenchDetector]¶

experiment: str¶

comments: str¶

features: List[str]¶

event_truth: List[str]¶

db_relpath: str = 'merged/merged.db'¶

selection_relpaths: Dict[str, str]¶

pulsemap_per_split: Dict[str, str]¶

class graphnet.datasets.nubench_datasets.NuBenchDataset(name, download_dir, data_representation, **kwargs)[source]¶

Bases: ERDAHostedDataset

Single entry point for every NuBench benchmark dataset.

Pick a dataset by its registry name (see available_datasets()) and pass a DataRepresentation whose detector matches the dataset. The tarball is downloaded from ERDA on first use and extracted into {download_dir}/{name}/.

The NuBench convention is that train/val events live in the merged_photons pulsemap while test events live in pulses_no_noise. This class builds each split against the correct pulsemap automatically.

Example:

from graphnet.models.graphs import KNNGraph
from graphnet.models.detector.nubench import Hexagon
from graphnet.datasets import NuBenchDataset

ds = NuBenchDataset(
    name="hexagon_ice_le",
    download_dir="/path/to/nubench_data",
    data_representation=KNNGraph(detector=Hexagon()),
)

Construct a NuBench dataset by registry name.

Parameters:

name (str) – Registry key of the NuBench dataset (see available_datasets()).
download_dir (str) – Directory to download and extract the dataset into.
data_representation (DataRepresentation) – Data representation whose detector must match the one expected by the selected dataset.
**kwargs (Any) – Forwarded to ERDAHostedDataset.

classmethod available_datasets()[source]¶

Return the list of registered NuBench dataset names.

Return type:: List[str]

property dataset_dir: str¶: Return the root directory of the extracted dataset.

prepare_data()[source]¶

Download + extract via ERDAHostedDataset if files are missing.

Return type:: None