graphnet.data.dataset.dataset module

Base Dataset class(es) used in GraphNeT.

graphnet.data.dataset.dataset.load_module(class_name)[source]

Load graphnet module from string name.

Parameters:

class_name (str) – name of class

Return type:

Type

Returns:

graphnet module.

graphnet.data.dataset.dataset.parse_graph_definition(cfg)[source]

Construct GraphDefinition from DatasetConfig.

Return type:

GraphDefinition

Parameters:

cfg (dict)

graphnet.data.dataset.dataset.parse_labels(cfg)[source]

Construct Label from DatasetConfig.

Return type:

Dict[str, Label]

Parameters:

cfg (dict)

class graphnet.data.dataset.dataset.Dataset(*args, **kwargs)[source]

Bases: Logger, Configurable, Dataset, ABC

Base Dataset class for reading from any intermediate file format.

Construct Dataset.

Parameters:
  • path (Union[str, List[str]]) – Path to the file(s) from which this Dataset should read.

  • pulsemaps (Union[str, List[str]]) – Name(s) of the pulse map series that should be used to construct the nodes on the individual graph objects, and their features. Multiple pulse series maps can be used, e.g., when different DOM types are stored in different maps.

  • features (List[str]) – List of columns in the input files that should be used as node features on the graph objects.

  • truth (List[str]) – List of event-level columns in the input files that should be used added as attributes on the graph objects.

  • node_truth (Optional[List[str]], default: None) – List of node-level columns in the input files that should be used added as attributes on the graph objects.

  • index_column (str, default: 'event_no') – Name of the column in the input files that contains unique indicies to identify and map events across tables.

  • truth_table (str, default: 'truth') – Name of the table containing event-level truth information.

  • node_truth_table (Optional[str], default: None) – Name of the table containing node-level truth information.

  • string_selection (Optional[List[int]], default: None) – Subset of strings for which data should be read and used to construct graph objects. Defaults to None, meaning all strings for which data exists are used.

  • selection (Union[str, List[int], List[List[int]], None], default: None) – The events that should be read. This can be given either as list of indicies (in index_column); or a string-based selection used to query the Dataset for events passing the selection. Defaults to None, meaning that all events in the input files are read.

  • dtype (dtype, default: torch.float32) – Type of the feature tensor on the graph objects returned.

  • loss_weight_table (Optional[str], default: None) – Name of the table containing per-event loss weights.

  • loss_weight_column (Optional[str], default: None) – Name of the column in loss_weight_table containing per-event loss weights. This is also the name of the corresponding attribute assigned to the graph object.

  • loss_weight_default_value (Optional[float], default: None) – Default per-event loss weight. NOTE: This default value is only applied when loss_weight_table and loss_weight_column are specified, and in this case to events with no value in the corresponding table/column. That is, if no per-event loss weight table/column is provided, this value is ignored. Defaults to None.

  • seed (Optional[int], default: None) – Random number generator seed, used for selecting a random subset of events when resolving a string-based selection (e.g., “10000 random events ~ event_no % 5 > 0” or “20% random events ~ event_no % 5 > 0”).

  • graph_definition (GraphDefinition) – Method that defines the graph representation.

  • labels (Optional[Dict[str, Any]], default: None) – Dictionary of labels to be added to the dataset.

  • args (Any)

  • kwargs (Any)

Return type:

object

classmethod from_config(source)[source]

Construct Dataset instance from source configuration.

Return type:

Union[Dataset, EnsembleDataset, Dict[str, Dataset], Dict[str, EnsembleDataset]]

Parameters:

source (DatasetConfig | str)

classmethod concatenate(datasets)[source]

Concatenate multiple `Dataset`s into one instance.

Return type:

EnsembleDataset

Parameters:

datasets (List[Dataset])

property path: str | List[str]

Path to the file(s) from which this Dataset reads.

property truth_table: str

Name of the table containing event-level truth information.

abstract query_table(table, columns, sequential_index, selection)[source]

Query a table at a specific index, optionally with some selection.

Parameters:
  • table (str) – Table to be queried.

  • columns (Union[List[str], str]) – Columns to read out.

  • sequential_index (Optional[int], default: None) – Sequentially numbered index (i.e. in [0,len(self))) of the event to query. This _may_ differ from the indexation used in self._indices. If no value is provided, the entire column is returned.

  • selection (Optional[str], default: None) – Selection to be imposed before reading out data. Defaults to None.

Return type:

ndarray

Returns:

List of tuples containing the values in columns. If the table

contains only scalar data for columns, a list of length 1 is returned

Raises:

ColumnMissingException – If one or more element in columns is not present in table.

add_label(fn, key)[source]

Add custom graph label define using function fn.

Return type:

None

Parameters:
  • fn (Callable[[Data], Any])

  • key (str | None)

class graphnet.data.dataset.dataset.EnsembleDataset(datasets)[source]

Bases: ConcatDataset

Construct a single dataset from a collection of datasets.

Construct a single dataset from a collection of datasets.

Parameters:

datasets (Iterable[Dataset]) – A collection of Datasets