parquet_writer¶
DataConverter for the Parquet backend.
- class graphnet.data.writers.parquet_writer.ParquetWriter(truth_table, index_column)[source]¶
Bases:
GraphNeTWriter
Class for writing interim data format to Parquet.
Construct ParquetWriter.
- Parameters:
truth_table (
str
, default:'truth'
) – Name of the tables containing event-level truth data. defaults to “truth”.index_column (
str
, default:'event_no'
) – The column used for indexation. Defaults to “event_no”.
- merge_files(files, output_dir, events_per_batch, num_workers)[source]¶
Convert files into shuffled batches.
Events will be shuffled, and the resulting batches will constitute random subsamples of the full dataset.
- Parameters:
files (
List
[str
]) – Files converted to parquet. Note this argument is ignored by this method, as these files are automatically found using the output_dir.output_dir (
str
) – The directory to store the batched data.events_per_batch (
int
, default:200000
) – Number of events in each batch. Defaults to 200000.num_workers (
int
, default:1
) – Number of workers to use for merging. Defaults to 1.
- Return type:
None