lmdb_dataset

Dataset class(es) for reading data from LMDB databases.

class graphnet.data.dataset.lmdb.lmdb_dataset.LMDBDataset(*args, **kwargs)[source]

Bases: Dataset

Pytorch dataset for reading data from LMDB databases.

Supports two modes: 1. Reading raw tables and computing data representations in real-time

(similar to SQLiteDataset)

  1. Reading pre-computed data representations directly from the database (skipping DataRepresentation computation)

Construct LMDBDataset.

Parameters:
  • path (Union[str, List[str]]) – Path to the LMDB database directory(ies).

  • pulsemaps (Union[str, List[str]]) – Name(s) of the pulse map series (used when reading raw tables, ignored when using pre-computed representations).

  • features (List[str]) – List of columns in the input files (used when reading raw tables, ignored when using pre-computed representations).

  • truth (List[str]) – List of event-level columns (used when reading raw tables, ignored when using pre-computed representations).

  • graph_definition (Optional[Any], default: None) – Method that defines the graph representation. NOTE: DEPRECATED Use data_representation instead.

  • data_representation (Optional[Any], default: None) – Method that defines the data representation.

  • node_truth (Optional[List[str]], default: None) – List of node-level columns in the input files that should be added as attributes on the graph objects.

  • index_column (str, default: 'event_no') – Name of the column in the input files that contains unique indices to identify and map events across tables.

  • truth_table (str, default: 'truth') – Name of the table containing event-level truth information.

  • node_truth_table (Optional[str], default: None) – Name of the table containing node-level truth information.

  • string_selection (Optional[List[int]], default: None) – Subset of strings for which data should be read and used to construct graph objects.

  • selection (Union[str, List[int], List[List[int]], None], default: None) – The events that should be read. This can be given either as list of indices (in index_column); or a string-based selection used to query the Dataset for events passing the selection.

  • dtype (Any, default: None) – Type of the feature tensor on the graph objects returned.

  • loss_weight_table (Optional[str], default: None) – Name of the table containing per-event loss weights.

  • loss_weight_column (Optional[str], default: None) – Name of the column in loss_weight_table containing per-event loss weights.

  • loss_weight_default_value (Optional[float], default: None) – Default per-event loss weight.

  • seed (Optional[int], default: None) – Random number generator seed, used for selecting a random subset of events when resolving a string-based selection.

  • labels (Optional[Dict[str, Any]], default: None) – Dictionary of labels to be added to the dataset.

  • pre_computed_representation (Optional[str], default: None) – Name of the pre-computed data representation to use. If None, reads raw tables and computes representations in real-time. If specified, extracts the pre-computed representation directly (by class name or key).

  • repeat_labels_by (Optional[int], default: None) – If specified, repeats the labels along the specified dimension.

  • args (Any)

  • kwargs (Any)

Return type:

object

query_table(table, columns, sequential_index, selection)[source]

Query table at a specific index, optionally with some selection.

Parameters:
  • table (str) – Table name (extractor name) to query.

  • columns (Union[List[str], str]) – Columns to read out.

  • sequential_index (Optional[int], default: None) – Sequentially index of the event to query.

  • selection (Optional[str], default: None) – Selection to be imposed (not fully supported for LMDB).

Return type:

ndarray

Returns:

Numpy array containing the values in columns.