graphnet.data.dataset.sqlite.sqlite_dataset module

Dataset class(es) for reading data from SQLite databases.

class graphnet.data.dataset.sqlite.sqlite_dataset.SQLiteDataset(*args, **kwargs)[source]

Bases: Dataset

Pytorch dataset for reading data from SQLite databases.

Construct Dataset.

Parameters:
  • path (Union[str, List[str]]) – Path to the file(s) from which this Dataset should read.

  • pulsemaps (Union[str, List[str]]) – Name(s) of the pulse map series that should be used to construct the nodes on the individual graph objects, and their features. Multiple pulse series maps can be used, e.g., when different DOM types are stored in different maps.

  • features (List[str]) – List of columns in the input files that should be used as node features on the graph objects.

  • truth (List[str]) – List of event-level columns in the input files that should be used added as attributes on the graph objects.

  • node_truth (Optional[List[str]], default: None) – List of node-level columns in the input files that should be used added as attributes on the graph objects.

  • index_column (str, default: 'event_no') – Name of the column in the input files that contains unique indicies to identify and map events across tables.

  • truth_table (str, default: 'truth') – Name of the table containing event-level truth information.

  • node_truth_table (Optional[str], default: None) – Name of the table containing node-level truth information.

  • string_selection (Optional[List[int]], default: None) – Subset of strings for which data should be read and used to construct graph objects. Defaults to None, meaning all strings for which data exists are used.

  • selection (Union[str, List[int], List[List[int]], None], default: None) – The events that should be read. This can be given either as list of indicies (in index_column); or a string-based selection used to query the Dataset for events passing the selection. Defaults to None, meaning that all events in the input files are read.

  • dtype (dtype, default: torch.float32) – Type of the feature tensor on the graph objects returned.

  • loss_weight_table (Optional[str], default: None) – Name of the table containing per-event loss weights.

  • loss_weight_column (Optional[str], default: None) – Name of the column in loss_weight_table containing per-event loss weights. This is also the name of the corresponding attribute assigned to the graph object.

  • loss_weight_default_value (Optional[float], default: None) – Default per-event loss weight. NOTE: This default value is only applied when loss_weight_table and loss_weight_column are specified, and in this case to events with no value in the corresponding table/column. That is, if no per-event loss weight table/column is provided, this value is ignored. Defaults to None.

  • seed (Optional[int], default: None) – Random number generator seed, used for selecting a random subset of events when resolving a string-based selection (e.g., “10000 random events ~ event_no % 5 > 0” or “20% random events ~ event_no % 5 > 0”).

  • graph_definition (GraphDefinition) – Method that defines the graph representation.

  • labels (Optional[Dict[str, Any]], default: None) – Dictionary of labels to be added to the dataset.

  • args (Any)

  • kwargs (Any)

Return type:

object

query_table(table, columns, sequential_index, selection)[source]

Query table at a specific index, optionally with some selection.

Return type:

List[Tuple[Any, ...]]

Parameters:
  • table (str)

  • columns (List[str] | str)

  • sequential_index (int | None)

  • selection (str | None)