graphnet.data.writers.parquet_writer module

DataConverter for the Parquet backend.

class graphnet.data.writers.parquet_writer.ParquetWriter(truth_table, index_column)[source]

Bases: GraphNeTWriter

Class for writing interim data format to Parquet.

Construct ParquetWriter.

Parameters:
  • truth_table (str, default: 'truth') – Name of the tables containing event-level truth data. defaults to “truth”.

  • index_column (str, default: 'event_no') – The column used for indexation. Defaults to “event_no”.

merge_files(files, output_dir, events_per_batch, num_workers)[source]

Convert files into shuffled batches.

Events will be shuffled, and the resulting batches will constitute random subsamples of the full dataset.

Parameters:
  • files (List[str]) – Files converted to parquet. Note this argument is ignored by this method, as these files are automatically found using the output_dir.

  • output_dir (str) – The directory to store the batched data.

  • events_per_batch (int, default: 200000) – Number of events in each batch. Defaults to 200000.

  • num_workers (int, default: 1) – Number of workers to use for merging. Defaults to 1.

Return type:

None