lmdb_writer¶
LMDB writer for GraphNeT’s data conversion pipeline.
Saves each event as a key/value pair where the key is the event index (event_no) and the value is a user-serialized blob. Optionally, a DataRepresentation can be injected to persist pre-built representations instead of raw tables.
- class graphnet.data.writers.lmdb_writer.LMDBWriter(index_column, map_size_bytes, serialization, data_representation, pulsemap_extractor_name, truth_extractor_name, truth_label_names)[source]¶
Bases:
GraphNeTWriterWriter that exports events to an LMDB database.
Each event is stored under key = bytes(str(event_no)) and value = bytes produced by a user-selected serialization function.
Construct LMDBWriter.
- Parameters:
index_column (
str, default:'event_no') – Column used as the per-event key (default: event_no).map_size_bytes (
int, default:8589934592) – LMDB map size. Defaults to 8 GiB.serialization (
Union[str,Callable[[Any],bytes]], default:'pickle') – Either a string in {“pickle”, “json”, “msgpack”, “dill”}, or a callable that takes an object and returns bytes.data_representation (
Union[DataRepresentation,List[DataRepresentation],None], default:None) – Optional DataRepresentation instance or list of instances. If provided together with extractor names and truth labels, the stored value will contain a “data_representations” field with outputs from each data_representation.forward(…) keyed by class name.pulsemap_extractor_name (
Optional[str], default:None) – Name of the extractor providing pulse-level features.truth_extractor_name (
Optional[str], default:None) – Name of the extractor providing event-level truth labels.truth_label_names (
Optional[List[str]], default:None) – Names of truth columns to include.
- merge_files(files, output_dir, target_name, allow_overwrite)[source]¶
Merge multiple LMDB environments into one.
Note: Keys are assumed unique across inputs. If a duplicate key is encountered and allow_overwrite=False, the key is skipped.
- Return type:
None- Parameters:
files (List[str])
output_dir (str)
target_name (str)
allow_overwrite (bool)