dataset¶
Base Dataset
class(es) used in GraphNeT.
- graphnet.data.dataset.dataset.load_module(class_name)[source]¶
Load graphnet module from string name.
- Parameters:
class_name (
str
) – name of class- Return type:
Type
- Returns:
graphnet module.
- graphnet.data.dataset.dataset.parse_graph_definition(cfg)[source]¶
Construct GraphDefinition from DatasetConfig.
- Return type:
- Parameters:
cfg (dict)
- graphnet.data.dataset.dataset.parse_labels(cfg)[source]¶
Construct Label from DatasetConfig.
- Return type:
Dict
[str
,Label
]- Parameters:
cfg (dict)
- class graphnet.data.dataset.dataset.Dataset(*args, **kwargs)[source]¶
Bases:
Logger
,Configurable
,Dataset
,ABC
Base Dataset class for reading from any intermediate file format.
Construct Dataset.
- Parameters:
path (
Union
[str
,List
[str
]]) – Path to the file(s) from which this Dataset should read.pulsemaps (
Union
[str
,List
[str
]]) – Name(s) of the pulse map series that should be used to construct the nodes on the individual graph objects, and their features. Multiple pulse series maps can be used, e.g., when different DOM types are stored in different maps.features (
List
[str
]) – List of columns in the input files that should be used as node features on the graph objects.truth (
List
[str
]) – List of event-level columns in the input files that should be used added as attributes on the graph objects.node_truth (
Optional
[List
[str
]], default:None
) – List of node-level columns in the input files that should be used added as attributes on the graph objects.index_column (
str
, default:'event_no'
) – Name of the column in the input files that contains unique indicies to identify and map events across tables.truth_table (
str
, default:'truth'
) – Name of the table containing event-level truth information.node_truth_table (
Optional
[str
], default:None
) – Name of the table containing node-level truth information.string_selection (
Optional
[List
[int
]], default:None
) – Subset of strings for which data should be read and used to construct graph objects. Defaults to None, meaning all strings for which data exists are used.selection (
Union
[str
,List
[int
],List
[List
[int
]],None
], default:None
) – The events that should be read. This can be given either as list of indicies (in index_column); or a string-based selection used to query the Dataset for events passing the selection. Defaults to None, meaning that all events in the input files are read.dtype (
dtype
, default:torch.float32
) – Type of the feature tensor on the graph objects returned.loss_weight_table (
Optional
[str
], default:None
) – Name of the table containing per-event loss weights.loss_weight_column (
Optional
[str
], default:None
) – Name of the column in loss_weight_table containing per-event loss weights. This is also the name of the corresponding attribute assigned to the graph object.loss_weight_default_value (
Optional
[float
], default:None
) – Default per-event loss weight. NOTE: This default value is only applied when loss_weight_table and loss_weight_column are specified, and in this case to events with no value in the corresponding table/column. That is, if no per-event loss weight table/column is provided, this value is ignored. Defaults to None.seed (
Optional
[int
], default:None
) – Random number generator seed, used for selecting a random subset of events when resolving a string-based selection (e.g., “10000 random events ~ event_no % 5 > 0” or “20% random events ~ event_no % 5 > 0”).graph_definition (
GraphDefinition
) – Method that defines the graph representation.labels (
Optional
[Dict
[str
,Any
]], default:None
) – Dictionary of labels to be added to the dataset.args (Any)
kwargs (Any)
- Return type:
object
- classmethod from_config(source)[source]¶
Construct Dataset instance from source configuration.
- Return type:
Union
[Dataset
,EnsembleDataset
,Dict
[str
,Dataset
],Dict
[str
,EnsembleDataset
]]- Parameters:
source (DatasetConfig | str)
- classmethod concatenate(datasets)[source]¶
Concatenate multiple `Dataset`s into one instance.
- Return type:
- Parameters:
datasets (List[Dataset])
- property path: str | List[str]¶
Path to the file(s) from which this Dataset reads.
- property truth_table: str¶
Name of the table containing event-level truth information.
- abstract query_table(table, columns, sequential_index, selection)[source]¶
Query a table at a specific index, optionally with some selection.
- Parameters:
table (
str
) – Table to be queried.columns (
Union
[List
[str
],str
]) – Columns to read out.sequential_index (
Optional
[int
], default:None
) – Sequentially numbered index (i.e. in [0,len(self))) of the event to query. This _may_ differ from the indexation used in self._indices. If no value is provided, the entire column is returned.selection (
Optional
[str
], default:None
) – Selection to be imposed before reading out data. Defaults to None.
- Return type:
ndarray
- Returns:
- List of tuples containing the values in columns. If the table
contains only scalar data for columns, a list of length 1 is returned
- Raises:
ColumnMissingException – If one or more element in columns is not present in table.