nubench_datasets

Curated datasets from the NuBench benchmark suite (arXiv:2511.13111).

class graphnet.datasets.nubench_datasets.NuBenchSpec(erda_hash, detector_cls, experiment, comments, features=<factory>, event_truth=<factory>, db_relpath/merged.db', selection_relpaths=<factory>, pulsemap_per_split=<factory>)[source]

Bases: object

Static configuration for a single NuBench dataset.

Parameters:
  • erda_hash (str)

  • detector_cls (Type[NuBenchDetector])

  • experiment (str)

  • comments (str)

  • features (List[str])

  • event_truth (List[str])

  • db_relpath (str)

  • selection_relpaths (Dict[str, str])

  • pulsemap_per_split (Dict[str, str])

erda_hash: str
detector_cls: Type[NuBenchDetector]
experiment: str
comments: str
features: List[str]
event_truth: List[str]
db_relpath: str = 'merged/merged.db'
selection_relpaths: Dict[str, str]
pulsemap_per_split: Dict[str, str]
class graphnet.datasets.nubench_datasets.NuBenchDataset(name, download_dir, data_representation, **kwargs)[source]

Bases: ERDAHostedDataset

Single entry point for every NuBench benchmark dataset.

Pick a dataset by its registry name (see available_datasets()) and pass a DataRepresentation whose detector matches the dataset. The tarball is downloaded from ERDA on first use and extracted into {download_dir}/{name}/.

The NuBench convention is that train/val events live in the merged_photons pulsemap while test events live in pulses_no_noise. This class builds each split against the correct pulsemap automatically.

Example:

from graphnet.models.graphs import KNNGraph
from graphnet.models.detector.nubench import Hexagon
from graphnet.datasets import NuBenchDataset

ds = NuBenchDataset(
    name="hexagon_ice_le",
    download_dir="/path/to/nubench_data",
    data_representation=KNNGraph(detector=Hexagon()),
)

Construct a NuBench dataset by registry name.

Parameters:
  • name (str) – Registry key of the NuBench dataset (see available_datasets()).

  • download_dir (str) – Directory to download and extract the dataset into.

  • data_representation (DataRepresentation) – Data representation whose detector must match the one expected by the selected dataset.

  • **kwargs (Any) – Forwarded to ERDAHostedDataset.

classmethod available_datasets()[source]

Return the list of registered NuBench dataset names.

Return type:

List[str]

property dataset_dir: str

Return the root directory of the extracted dataset.

prepare_data()[source]

Download + extract via ERDAHostedDataset if files are missing.

Return type:

None