dataconverter¶
Contains DataConverter.
- graphnet.data.dataconverter.init_global_index(index, output_files)[source]¶
Make global_index available to pool workers.
- Return type:
None
- Parameters:
index (Synchronized)
output_files (List[str])
- class graphnet.data.dataconverter.DataConverter(file_reader, save_method, outdir, extractors, index_column, num_workers)[source]¶
Bases:
ABC
,Logger
A finalized data conversion class in GraphNeT.
DataConverter provides parallel processing of file conversion and extraction from experiment-specific file formats to graphnet-supported data formats. This class also assigns event id’s to training examples.
Initialize DataConverter.
- Parameters:
file_reader (
GraphNeTFileReader
) – The method used for reading and applying Extractors.save_method (
GraphNeTWriter
) – The method used to save the interim data format to a graphnet supported file format.outdir (
str
) – The directory to save the files in.extractors (
Union
[List
[Extractor
],List
[I3Extractor
],List
[ParquetExtractor
],List
[H5Extractor
],List
[PrometheusExtractor
]]) – The `Extractor`(s) that will be applied to the input files.index_column (
str
, default:'event_no'
) – Name of the event id column added to the events. Defaults to “event_no”.num_workers (
int
, default:1
) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).
- get_map_function(nb_files, unit(s)')[source]¶
Identify map function to use (pure python or multiprocess).
- Return type:
Tuple
[Any
,Optional
[Pool
]]- Parameters:
nb_files (int)
unit (str)
- merge_files(files, output_dir, **kwargs)[source]¶
Merge converted files.
DataConverter will call the .merge_files method in the GraphNeTWriter module that it was instantiated with.
- Parameters:
files (
Union
[List
[str
],str
,None
], default:None
) – Intermediate files to be merged.output_dir (
Optional
[str
], default:None
) – Directory to save the merged files in.**kwargs (
Any
) – Additional keyword arguments to be passed to the GraphNeTWriter.merge_files method.
- Return type:
None