graphnet.utilities.config.dataset_config module

Config classes for the graphnet.data.dataset module.

class graphnet.utilities.config.dataset_config.DatasetConfig(*, path, pulsemaps, features, truth, node_truth, index_column, truth_table, node_truth_table, string_selection, selection, loss_weight_table, loss_weight_column, loss_weight_default_value, seed, graph_definition, labels)[source]

Bases: BaseConfig

Configuration for all `Dataset`s.

Construct DataConfig.

Can be used for dataset configuration as code, thereby making dataset construction more transparent and reproducible.

Examples

In one session, do:

>>> dataset = Dataset(...)
>>> dataset.config.dump()
path: (...)
pulsemaps:
    - (...)
(...)
>>> dataset.config.dump("dataset.yml")

In another session, you can then do: >>> dataset = Dataset.from_config(“dataset.yml”)

# Uniquely for DatasetConfig, you can also define and load # multiple datasets >>> dataset.config.selection = {

“train”: “event_no % 2 == 0”, “test”: “event_no % 2 == 1”,

} >>> dataset.config.dump(“dataset.yml”) >>> datasets: Dict[str, Dataset] = Dataset.from_config(

“dataset.yml”

) >>> datasets {

“train”: Dataset(…), “test”: Dataset(…),

}

# You can also combine multiple selections into a single, named # dataset >>> dataset.config.selection = {

“train”: [

“event_no % 2 == 0 & abs(pid) == 12”, “event_no % 2 == 0 & abs(pid) == 14”, “event_no % 2 == 0 & abs(pid) == 16”,

], (…)

} >>> dataset.config.dump(“dataset.yml”) >>> datasets: Dict[str, EnsembleDataset] = Dataset.from_config(

“dataset.yml”

) >>> datasets {

“train”: EnsembleDataset(…), (…)

}

# Finally, you can still reference existing selection files in CSV # or JSON formats: >>> dataset.config.selection = {

“train”: “50000 random events ~ train_selection.csv”, “test”: “test_selection.csv”,

}

Parameters:
  • path (str | List[str])

  • pulsemaps (str | List[str])

  • features (List[str])

  • truth (List[str])

  • node_truth (List[str] | None)

  • index_column (str)

  • truth_table (str)

  • node_truth_table (str | None)

  • string_selection (List[int] | None)

  • selection (str | List[str] | List[int | List[int]] | Dict[str, str | List[str]] | None)

  • loss_weight_table (str | None)

  • loss_weight_column (str | None)

  • loss_weight_default_value (float | None)

  • seed (int | None)

  • graph_definition (Any)

  • labels (Dict[str, Any] | None)

path: Union[str, List[str]]
pulsemaps: Union[str, List[str]]
features: List[str]
truth: List[str]
node_truth: Optional[List[str]]
index_column: str
truth_table: str
node_truth_table: Optional[str]
string_selection: Optional[List[int]]
selection: Union[str, List[str], List[Union[int, List[int]]], Dict[str, Union[str, List[str]]], None]
loss_weight_table: Optional[str]
loss_weight_column: Optional[str]
loss_weight_default_value: Optional[float]
seed: Optional[int]
graph_definition: Any
labels: Optional[Dict[str, Any]]
as_dict()[source]

Represent ModelConfig as a dict.

This builds on BaseModel.dict() but wraps the output in a single-key dictionary to make it unambiguous to identify model arguments that are themselves models.

Return type:

Dict[str, Dict[str, Any]]

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'features': FieldInfo(annotation=List[str], required=True), 'graph_definition': FieldInfo(annotation=Any, required=False, default=None), 'index_column': FieldInfo(annotation=str, required=False, default='event_no'), 'labels': FieldInfo(annotation=Union[Dict[str, Any], NoneType], required=False, default=None), 'loss_weight_column': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'loss_weight_default_value': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'loss_weight_table': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'node_truth': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'node_truth_table': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'path': FieldInfo(annotation=Union[str, List[str]], required=True), 'pulsemaps': FieldInfo(annotation=Union[str, List[str]], required=True), 'seed': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'selection': FieldInfo(annotation=Union[str, List[str], List[Union[int, List[int]]], Dict[str, Union[str, List[str]]], NoneType], required=False, default=None), 'string_selection': FieldInfo(annotation=Union[List[int], NoneType], required=False, default=None), 'truth': FieldInfo(annotation=List[str], required=True), 'truth_table': FieldInfo(annotation=str, required=False, default='truth')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

graphnet.utilities.config.dataset_config.save_dataset_config(init_fn)[source]

Save the arguments to __init__ functions as member DatasetConfig.

Return type:

Callable

Parameters:

init_fn (Callable)

class graphnet.utilities.config.dataset_config.DatasetConfigSaverMeta[source]

Bases: type

Metaclass for DatasetConfig that saves the config after __init__.

class graphnet.utilities.config.dataset_config.DatasetConfigSaverABCMeta(name, bases, namespace, **kwargs)[source]

Bases: DatasetConfigSaverMeta, ABCMeta

Common interface between DatasetConfigSaver and ABC Metaclasses.