dataconverters

Pre-configured combinations of writers and readers.

class graphnet.data.pre_configured.dataconverters.I3ToParquetConverter(gcd_rescue, extractors, outdir, index_column, num_workers, i3_filters)[source]

Bases: DataConverter

Preconfigured DataConverter for converting i3 files to parquet files.

Convert I3 files to Parquet.

Parameters:
  • gcd_rescue (str) – gcd_rescue: Path to a GCD file that will be used if no GCD file is found in subfolder. I3Reader will recursively search the input directory for I3-GCD file pairs. By IceCube convention, a folder containing i3 files will have an accompanying GCD file. However, in some cases, this convention is broken. In cases where a folder contains i3 files but no GCD file, the gcd_rescue is used instead.

  • extractors (List[I3Extractor]) – The `Extractor`(s) that will be applied to the input files.

  • outdir (str) – The directory to save the files in.

  • icetray_verbose – Set the level of verbosity of icetray. Defaults to 0.

  • index_column (str, default: 'event_no') – Name of the event id column added to the events. Defaults to “event_no”.

  • num_workers (int, default: 1) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).

  • i3_filters (Union[I3Filter, List[I3Filter]], default: None) – Instances of I3Filter to filter PFrames. Defaults to NullSplitI3Filter.

class graphnet.data.pre_configured.dataconverters.I3ToSQLiteConverter(gcd_rescue, extractors, outdir, index_column, num_workers, i3_filters, max_table_size)[source]

Bases: DataConverter

Preconfigured DataConverter for converting i3 files to SQLite files.

Convert I3 files to SQLite.

Parameters:
  • gcd_rescue (str) – gcd_rescue: Path to a GCD file that will be used if no GCD file is found in subfolder. I3Reader will recursively search the input directory for I3-GCD file pairs. By IceCube convention, a folder containing i3 files will have an accompanying GCD file. However, in some cases, this convention is broken. In cases where a folder contains i3 files but no GCD file, the gcd_rescue is used instead.

  • extractors (List[I3Extractor]) – The `Extractor`(s) that will be applied to the input files.

  • outdir (str) – The directory to save the files in.

  • icetray_verbose – Set the level of verbosity of icetray. Defaults to 0.

  • index_column (str, default: 'event_no') – Name of the event id column added to the events. Defaults to “event_no”.

  • num_workers (int, default: 1) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).

  • i3_filters (Union[I3Filter, List[I3Filter]], default: None) – Instances of I3Filter to filter PFrames. Defaults to NullSplitI3Filter.

  • max_table_size (Optional[int], default: None) – Maximum size of the SQLite tables. Default None.

class graphnet.data.pre_configured.dataconverters.I3ToLMDBConverter(gcd_rescue, extractors, outdir, index_column, num_workers, i3_filters, map_size_bytes, serialization, data_representation, pulsemap_extractor_name, truth_extractor_name, truth_label_names)[source]

Bases: DataConverter

Preconfigured DataConverter for converting i3 files to LMDB files.

Convert I3 files to LMDB.

Parameters:
  • gcd_rescue (str) – gcd_rescue: Path to a GCD file that will be used if no GCD file is found in subfolder. I3Reader will recursively search the input directory for I3-GCD file pairs. By IceCube convention, a folder containing i3 files will have an accompanying GCD file. However, in some cases, this convention is broken. In cases where a folder contains i3 files but no GCD file, the gcd_rescue is used instead.

  • extractors (List[I3Extractor]) – The `Extractor`(s) that will be applied to the input files.

  • outdir (str) – The directory to save the files in.

  • index_column (str, default: 'event_no') – Name of the event id column added to the events. Defaults to “event_no”.

  • num_workers (int, default: 1) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).

  • i3_filters (Union[I3Filter, List[I3Filter]], default: None) – Instances of I3Filter to filter PFrames. Defaults to NullSplitI3Filter.

  • map_size_bytes (int, default: 8589934592) – LMDB map size. Defaults to 8 GiB.

  • serialization (Union[str, Callable[[Any], bytes]], default: 'pickle') – Either a string in {“pickle”, “json”, “msgpack”, “dill”}, or a callable that takes an object and returns bytes. Defaults to “pickle”.

  • data_representation (Union[DataRepresentation, List[DataRepresentation], None], default: None) – Optional DataRepresentation instance or list of instances. If provided together with extractor_name and truth_extractor_name, the stored value will contain a “data_representations” field with outputs from each data_representation.forward(…) keyed by class name.

  • pulsemap_extractor_name (Optional[str], default: None) – Name of the extractor providing pulse-level features.

  • truth_extractor_name (Optional[str], default: None) – Name of the extractor providing event-level truth labels.

  • truth_label_names (Optional[List[str]], default: None) – Names of truth columns to include.

class graphnet.data.pre_configured.dataconverters.ParquetToSQLiteConverter(extractors, outdir, index_column, num_workers)[source]

Bases: DataConverter

Preconfigured DataConverter for converting Parquet to SQLite files.

This class converts Parquet files written by ParquetWriter to SQLite.

Convert internal Parquet files to SQLite.

Parameters:
  • extractors (List[ParquetExtractor]) – The `Extractor`(s) that will be applied to the input files.

  • outdir (str) – The directory to save the files in.

  • icetray_verbose – Set the level of verbosity of icetray. Defaults to 0.

  • index_column (str, default: 'event_no') – Name of the event id column added to the events. Defaults to “event_no”.

  • num_workers (int, default: 1) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).

class graphnet.data.pre_configured.dataconverters.SQLiteToLMDBConverter(extractors, outdir, index_column, num_workers, subset_size, map_size_bytes, serialization, data_representation, pulsemap_extractor_name, truth_extractor_name, truth_label_names)[source]

Bases: DataConverter

Preconfigured DataConverter for converting SQLite to LMDB files.

This class converts SQLite files written by SQLiteWriter to LMDB.

Convert internal SQLite files to LMDB.

Parameters:
  • extractors (List[SQLiteExtractor]) – The `Extractor`(s) that will be applied to the input files.

  • outdir (str) – The directory to save the files in.

  • index_column (str, default: 'event_no') – Name of the event id column added to the events. Defaults to “event_no”.

  • num_workers (int, default: 1) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).

  • subset_size (int, default: 10000) – Number of events per fileset chunk for SQLiteReader. Defaults to 10000.

  • map_size_bytes (int, default: 8589934592) – LMDB map size. Defaults to 8 GiB.

  • serialization (Union[str, Callable[[Any], bytes]], default: 'pickle') – Either a string in {“pickle”, “json”, “msgpack”, “dill”}, or a callable that takes an object and returns bytes. Defaults to “pickle”.

  • data_representation (Union[DataRepresentation, List[DataRepresentation], None], default: None) – Optional DataRepresentation instance or list of instances. If provided together with extractor_name and truth_extractor_name, names and truth labels, the stored value will contain a “data_representations” field with outputs from each data_representation.forward(…) keyed by class name.

  • pulsemap_extractor_name (Optional[str], default: None) – Name of the extractor providing pulse-level features.

  • truth_extractor_name (Optional[str], default: None) – Name of the extractor providing event-level truth labels.

  • truth_label_names (Optional[List[str]], default: None) – Names of truth columns to include.