dataconverters¶
Pre-configured combinations of writers and readers.
- class graphnet.data.pre_configured.dataconverters.I3ToParquetConverter(gcd_rescue, extractors, outdir, index_column, num_workers, i3_filters)[source]¶
Bases:
DataConverterPreconfigured DataConverter for converting i3 files to parquet files.
Convert I3 files to Parquet.
- Parameters:
gcd_rescue (
str) – gcd_rescue: Path to a GCD file that will be used if no GCD file is found in subfolder. I3Reader will recursively search the input directory for I3-GCD file pairs. By IceCube convention, a folder containing i3 files will have an accompanying GCD file. However, in some cases, this convention is broken. In cases where a folder contains i3 files but no GCD file, the gcd_rescue is used instead.extractors (
List[I3Extractor]) – The `Extractor`(s) that will be applied to the input files.outdir (
str) – The directory to save the files in.icetray_verbose – Set the level of verbosity of icetray. Defaults to 0.
index_column (
str, default:'event_no') – Name of the event id column added to the events. Defaults to “event_no”.num_workers (
int, default:1) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).i3_filters (
Union[I3Filter,List[I3Filter]], default:None) – Instances of I3Filter to filter PFrames. Defaults to NullSplitI3Filter.
- class graphnet.data.pre_configured.dataconverters.I3ToSQLiteConverter(gcd_rescue, extractors, outdir, index_column, num_workers, i3_filters, max_table_size)[source]¶
Bases:
DataConverterPreconfigured DataConverter for converting i3 files to SQLite files.
Convert I3 files to SQLite.
- Parameters:
gcd_rescue (
str) – gcd_rescue: Path to a GCD file that will be used if no GCD file is found in subfolder. I3Reader will recursively search the input directory for I3-GCD file pairs. By IceCube convention, a folder containing i3 files will have an accompanying GCD file. However, in some cases, this convention is broken. In cases where a folder contains i3 files but no GCD file, the gcd_rescue is used instead.extractors (
List[I3Extractor]) – The `Extractor`(s) that will be applied to the input files.outdir (
str) – The directory to save the files in.icetray_verbose – Set the level of verbosity of icetray. Defaults to 0.
index_column (
str, default:'event_no') – Name of the event id column added to the events. Defaults to “event_no”.num_workers (
int, default:1) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).i3_filters (
Union[I3Filter,List[I3Filter]], default:None) – Instances of I3Filter to filter PFrames. Defaults to NullSplitI3Filter.max_table_size (
Optional[int], default:None) – Maximum size of the SQLite tables. Default None.
- class graphnet.data.pre_configured.dataconverters.I3ToLMDBConverter(gcd_rescue, extractors, outdir, index_column, num_workers, i3_filters, map_size_bytes, serialization, data_representation, pulsemap_extractor_name, truth_extractor_name, truth_label_names)[source]¶
Bases:
DataConverterPreconfigured DataConverter for converting i3 files to LMDB files.
Convert I3 files to LMDB.
- Parameters:
gcd_rescue (
str) – gcd_rescue: Path to a GCD file that will be used if no GCD file is found in subfolder. I3Reader will recursively search the input directory for I3-GCD file pairs. By IceCube convention, a folder containing i3 files will have an accompanying GCD file. However, in some cases, this convention is broken. In cases where a folder contains i3 files but no GCD file, the gcd_rescue is used instead.extractors (
List[I3Extractor]) – The `Extractor`(s) that will be applied to the input files.outdir (
str) – The directory to save the files in.index_column (
str, default:'event_no') – Name of the event id column added to the events. Defaults to “event_no”.num_workers (
int, default:1) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).i3_filters (
Union[I3Filter,List[I3Filter]], default:None) – Instances of I3Filter to filter PFrames. Defaults to NullSplitI3Filter.map_size_bytes (
int, default:8589934592) – LMDB map size. Defaults to 8 GiB.serialization (
Union[str,Callable[[Any],bytes]], default:'pickle') – Either a string in {“pickle”, “json”, “msgpack”, “dill”}, or a callable that takes an object and returns bytes. Defaults to “pickle”.data_representation (
Union[DataRepresentation,List[DataRepresentation],None], default:None) – Optional DataRepresentation instance or list of instances. If provided together with extractor_name and truth_extractor_name, the stored value will contain a “data_representations” field with outputs from each data_representation.forward(…) keyed by class name.pulsemap_extractor_name (
Optional[str], default:None) – Name of the extractor providing pulse-level features.truth_extractor_name (
Optional[str], default:None) – Name of the extractor providing event-level truth labels.truth_label_names (
Optional[List[str]], default:None) – Names of truth columns to include.
- class graphnet.data.pre_configured.dataconverters.ParquetToSQLiteConverter(extractors, outdir, index_column, num_workers)[source]¶
Bases:
DataConverterPreconfigured DataConverter for converting Parquet to SQLite files.
This class converts Parquet files written by ParquetWriter to SQLite.
Convert internal Parquet files to SQLite.
- Parameters:
extractors (
List[ParquetExtractor]) – The `Extractor`(s) that will be applied to the input files.outdir (
str) – The directory to save the files in.icetray_verbose – Set the level of verbosity of icetray. Defaults to 0.
index_column (
str, default:'event_no') – Name of the event id column added to the events. Defaults to “event_no”.num_workers (
int, default:1) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).
- class graphnet.data.pre_configured.dataconverters.SQLiteToLMDBConverter(extractors, outdir, index_column, num_workers, subset_size, map_size_bytes, serialization, data_representation, pulsemap_extractor_name, truth_extractor_name, truth_label_names)[source]¶
Bases:
DataConverterPreconfigured DataConverter for converting SQLite to LMDB files.
This class converts SQLite files written by SQLiteWriter to LMDB.
Convert internal SQLite files to LMDB.
- Parameters:
extractors (
List[SQLiteExtractor]) – The `Extractor`(s) that will be applied to the input files.outdir (
str) – The directory to save the files in.index_column (
str, default:'event_no') – Name of the event id column added to the events. Defaults to “event_no”.num_workers (
int, default:1) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).subset_size (
int, default:10000) – Number of events per fileset chunk for SQLiteReader. Defaults to 10000.map_size_bytes (
int, default:8589934592) – LMDB map size. Defaults to 8 GiB.serialization (
Union[str,Callable[[Any],bytes]], default:'pickle') – Either a string in {“pickle”, “json”, “msgpack”, “dill”}, or a callable that takes an object and returns bytes. Defaults to “pickle”.data_representation (
Union[DataRepresentation,List[DataRepresentation],None], default:None) – Optional DataRepresentation instance or list of instances. If provided together with extractor_name and truth_extractor_name, names and truth labels, the stored value will contain a “data_representations” field with outputs from each data_representation.forward(…) keyed by class name.pulsemap_extractor_name (
Optional[str], default:None) – Name of the extractor providing pulse-level features.truth_extractor_name (
Optional[str], default:None) – Name of the extractor providing event-level truth labels.truth_label_names (
Optional[List[str]], default:None) – Names of truth columns to include.