graphnet.data.dataconverter module

Contains DataConverter.

graphnet.data.dataconverter.init_global_index(index, output_files)[source]

Make global_index available to pool workers.

Return type:

None

Parameters:
  • index (Synchronized)

  • output_files (List[str])

class graphnet.data.dataconverter.DataConverter(file_reader, save_method, outdir, extractors, index_column, num_workers)[source]

Bases: ABC, Logger

A finalized data conversion class in GraphNeT.

DataConverter provides parallel processing of file conversion and extraction from experiment-specific file formats to graphnet-supported data formats. This class also assigns event id’s to training examples.

Initialize DataConverter.

Parameters:
  • file_reader (GraphNeTFileReader) – The method used for reading and applying Extractors.

  • save_method (GraphNeTWriter) – The method used to save the interim data format to a graphnet supported file format.

  • outdir (str) – The directory to save the files in.

  • extractors (Union[List[Extractor], List[I3Extractor], List[ParquetExtractor], List[H5Extractor], List[PrometheusExtractor]]) – The `Extractor`(s) that will be applied to the input files.

  • index_column (str, default: 'event_no') – Name of the event id column added to the events. Defaults to “event_no”.

  • num_workers (int, default: 1) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).

get_map_function(nb_files, unit(s)')[source]

Identify map function to use (pure python or multiprocess).

Return type:

Tuple[Any, Optional[Pool]]

Parameters:
  • nb_files (int)

  • unit (str)

merge_files(files, **kwargs)[source]

Merge converted files.

DataConverter will call the .merge_files method in the GraphNeTWriter module that it was instantiated with.

Parameters:
  • files (Union[List[str], str, None], default: None) – Intermediate files to be merged.

  • kwargs (Any)

Return type:

None