lmdb_dataset¶

Dataset class(es) for reading data from LMDB databases.

class graphnet.data.dataset.lmdb.lmdb_dataset.LMDBDataset(*args, **kwargs)[source]¶

Bases: Dataset

Pytorch dataset for reading data from LMDB databases.

Supports two modes: 1. Reading raw tables and computing data representations in real-time

(similar to SQLiteDataset)

Reading pre-computed data representations directly from the database (skipping DataRepresentation computation)

Construct LMDBDataset.

Parameters:

path (Union[str, List[str]]) – Path to the LMDB database directory(ies).
pulsemaps (Union[str, List[str]]) – Name(s) of the pulse map series (used when reading raw tables, ignored when using pre-computed representations).
features (List[str]) – List of columns in the input files (used when reading raw tables, ignored when using pre-computed representations).
truth (List[str]) – List of event-level columns (used when reading raw tables, ignored when using pre-computed representations).
graph_definition (Optional[Any], default: None) – Method that defines the graph representation. NOTE: DEPRECATED Use data_representation instead.
data_representation (Optional[Any], default: None) – Method that defines the data representation.
node_truth (Optional[List[str]], default: None) – List of node-level columns in the input files that should be added as attributes on the graph objects.
index_column (str, default: 'event_no') – Name of the column in the input files that contains unique indices to identify and map events across tables.
truth_table (str, default: 'truth') – Name of the table containing event-level truth information.
node_truth_table (Optional[str], default: None) – Name of the table containing node-level truth information.
string_selection (Optional[List[int]], default: None) – Subset of strings for which data should be read and used to construct graph objects.
selection (Union[str, List[int], List[List[int]], None], default: None) – The events that should be read. This can be given either as list of indices (in index_column); or a string-based selection used to query the Dataset for events passing the selection.
dtype (Any, default: None) – Type of the feature tensor on the graph objects returned.
loss_weight_table (Optional[str], default: None) – Name of the table containing per-event loss weights.
loss_weight_column (Optional[str], default: None) – Name of the column in loss_weight_table containing per-event loss weights.
loss_weight_default_value (Optional[float], default: None) – Default per-event loss weight.
seed (Optional[int], default: None) – Random number generator seed, used for selecting a random subset of events when resolving a string-based selection.
labels (Optional[Dict[str, Any]], default: None) – Dictionary of labels to be added to the dataset.
pre_computed_representation (Optional[str], default: None) – Name of the pre-computed data representation to use. If None, reads raw tables and computes representations in real-time. If specified, extracts the pre-computed representation directly (by class name or key).
repeat_labels_by (Optional[int], default: None) – If specified, repeats the labels along the specified dimension.
args (Any)
kwargs (Any)

Return type:

object

query_table(table, columns, sequential_index, selection)[source]¶

Query table at a specific index, optionally with some selection.

Parameters:

table (str) – Table name (extractor name) to query.
columns (Union[List[str], str]) – Columns to read out.
sequential_index (Optional[int], default: None) – Sequentially index of the event to query.
selection (Optional[str], default: None) – Selection to be imposed (not fully supported for LMDB).

Return type:

ndarray

Returns:

Numpy array containing the values in columns.

close()[source]¶

Close any open LMDB connections.

Return type:: None