lmdb_dataset¶
Dataset class(es) for reading data from LMDB databases.
- class graphnet.data.dataset.lmdb.lmdb_dataset.LMDBDataset(*args, **kwargs)[source]¶
Bases:
DatasetPytorch dataset for reading data from LMDB databases.
Supports two modes: 1. Reading raw tables and computing data representations in real-time
(similar to SQLiteDataset)
Reading pre-computed data representations directly from the database (skipping DataRepresentation computation)
Construct LMDBDataset.
- Parameters:
path (
Union[str,List[str]]) – Path to the LMDB database directory(ies).pulsemaps (
Union[str,List[str]]) – Name(s) of the pulse map series (used when reading raw tables, ignored when using pre-computed representations).features (
List[str]) – List of columns in the input files (used when reading raw tables, ignored when using pre-computed representations).truth (
List[str]) – List of event-level columns (used when reading raw tables, ignored when using pre-computed representations).graph_definition (
Optional[Any], default:None) – Method that defines the graph representation. NOTE: DEPRECATED Use data_representation instead.data_representation (
Optional[Any], default:None) – Method that defines the data representation.node_truth (
Optional[List[str]], default:None) – List of node-level columns in the input files that should be added as attributes on the graph objects.index_column (
str, default:'event_no') – Name of the column in the input files that contains unique indices to identify and map events across tables.truth_table (
str, default:'truth') – Name of the table containing event-level truth information.node_truth_table (
Optional[str], default:None) – Name of the table containing node-level truth information.string_selection (
Optional[List[int]], default:None) – Subset of strings for which data should be read and used to construct graph objects.selection (
Union[str,List[int],List[List[int]],None], default:None) – The events that should be read. This can be given either as list of indices (in index_column); or a string-based selection used to query the Dataset for events passing the selection.dtype (
Any, default:None) – Type of the feature tensor on the graph objects returned.loss_weight_table (
Optional[str], default:None) – Name of the table containing per-event loss weights.loss_weight_column (
Optional[str], default:None) – Name of the column in loss_weight_table containing per-event loss weights.loss_weight_default_value (
Optional[float], default:None) – Default per-event loss weight.seed (
Optional[int], default:None) – Random number generator seed, used for selecting a random subset of events when resolving a string-based selection.labels (
Optional[Dict[str,Any]], default:None) – Dictionary of labels to be added to the dataset.pre_computed_representation (
Optional[str], default:None) – Name of the pre-computed data representation to use. If None, reads raw tables and computes representations in real-time. If specified, extracts the pre-computed representation directly (by class name or key).repeat_labels_by (
Optional[int], default:None) – If specified, repeats the labels along the specified dimension.args (Any)
kwargs (Any)
- Return type:
object
- query_table(table, columns, sequential_index, selection)[source]¶
Query table at a specific index, optionally with some selection.
- Parameters:
table (
str) – Table name (extractor name) to query.columns (
Union[List[str],str]) – Columns to read out.sequential_index (
Optional[int], default:None) – Sequentially index of the event to query.selection (
Optional[str], default:None) – Selection to be imposed (not fully supported for LMDB).
- Return type:
ndarray- Returns:
Numpy array containing the values in columns.