lmdb_writer

LMDB writer for GraphNeT’s data conversion pipeline.

Saves each event as a key/value pair where the key is the event index (event_no) and the value is a user-serialized blob. Optionally, a DataRepresentation can be injected to persist pre-built representations instead of raw tables.

class graphnet.data.writers.lmdb_writer.LMDBWriter(index_column, map_size_bytes, serialization, data_representation, pulsemap_extractor_name, truth_extractor_name, truth_label_names)[source]

Bases: GraphNeTWriter

Writer that exports events to an LMDB database.

Each event is stored under key = bytes(str(event_no)) and value = bytes produced by a user-selected serialization function.

Construct LMDBWriter.

Parameters:
  • index_column (str, default: 'event_no') – Column used as the per-event key (default: event_no).

  • map_size_bytes (int, default: 8589934592) – LMDB map size. Defaults to 8 GiB.

  • serialization (Union[str, Callable[[Any], bytes]], default: 'pickle') – Either a string in {“pickle”, “json”, “msgpack”, “dill”}, or a callable that takes an object and returns bytes.

  • data_representation (Union[DataRepresentation, List[DataRepresentation], None], default: None) – Optional DataRepresentation instance or list of instances. If provided together with extractor names and truth labels, the stored value will contain a “data_representations” field with outputs from each data_representation.forward(…) keyed by class name.

  • pulsemap_extractor_name (Optional[str], default: None) – Name of the extractor providing pulse-level features.

  • truth_extractor_name (Optional[str], default: None) – Name of the extractor providing event-level truth labels.

  • truth_label_names (Optional[List[str]], default: None) – Names of truth columns to include.

merge_files(files, output_dir, target_name, allow_overwrite)[source]

Merge multiple LMDB environments into one.

Note: Keys are assumed unique across inputs. If a duplicate key is encountered and allow_overwrite=False, the key is skipped.

Return type:

None

Parameters:
  • files (List[str])

  • output_dir (str)

  • target_name (str)

  • allow_overwrite (bool)