nodes

Class(es) for building/connecting graphs.

class graphnet.models.data_representation.graphs.nodes.nodes.NodeDefinition(*args, **kwargs)[source]

Bases: Model

Base class for graph building.

Construct Detector.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

object

forward(x)[source]

Construct nodes from raw node features.

Parameters:
  • x (Tensor) – standardized node features with shape ´[num_pulses, d]´,

  • features. (where ´d´ is the number of node)

  • node_feature_names – list of names for each column in ´x´.

Return type:

Tensor

Returns:

Node feature tensor of shape ´[num_nodes, num_features]´.

property nb_outputs: int

Return number of output features.

This the default, but may be overridden by specific inheriting classes.

set_number_of_inputs(input_feature_names)[source]

Return number of inputs expected by node definition.

Parameters:

input_feature_names (List[str]) – name of each input feature column.

Return type:

None

set_output_feature_names(input_feature_names)[source]

Set output features names as a member variable.

Parameters:
  • input_feature_names (List[str]) – List of column names of the input to the

  • definition. (node)

Return type:

None

class graphnet.models.data_representation.graphs.nodes.nodes.NodesAsPulses(*args, **kwargs)[source]

Bases: NodeDefinition

Represent each measured pulse of Cherenkov Radiation as a node.

Construct Detector.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

object

class graphnet.models.data_representation.graphs.nodes.nodes.PercentileClusters(*args, **kwargs)[source]

Bases: NodeDefinition

Represent nodes as clusters with percentile summary node features.

If cluster_on is set to the xyz coordinates of DOMs e.g. cluster_on = [‘dom_x’, ‘dom_y’, ‘dom_z’], each node will be a unique DOM and the pulse information (charge, time) is summarized using percentiles.

Construct PercentileClusters.

Parameters:
  • cluster_on (List[str]) – Names of features to create clusters from.

  • percentiles (List[int]) – List of percentiles. E.g. [10, 50, 90].

  • add_counts (bool, default: True) – If True, number of duplicates is added to output array.

  • input_feature_names (Optional[List[str]], default: None) – (Optional) column names for input features.

  • args (Any)

  • kwargs (Any)

Return type:

object

class graphnet.models.data_representation.graphs.nodes.nodes.NodeAsDOMTimeSeries(*args, **kwargs)[source]

Bases: NodeDefinition

Represent each node as a DOM with time and charge time series data.

Construct NodeAsDOMTimeSeries.

Parameters:
  • keys (List[str], default: ['dom_x', 'dom_y', 'dom_z', 'dom_time', 'charge']) – Names of features in the data (in order).

  • id_columns (List[str], default: ['dom_x', 'dom_y', 'dom_z']) – List of columns that uniquely identify a DOM.

  • time_column (str, default: 'dom_time') – Name of time column.

  • charge_column (str, default: 'charge') – Name of charge column.

  • max_activations (Optional[int], default: None) – Maximum number of activations to include in the time series.

  • args (Any)

  • kwargs (Any)

Return type:

object

class graphnet.models.data_representation.graphs.nodes.nodes.IceMixNodes(*args, **kwargs)[source]

Bases: NodeDefinition

Calculate ice properties and perform random sampling.

Ice properties are calculated based on the z-coordinate of the pulse. For each event, a random sampling is performed to keep the number of pulses below a maximum number of pulses if n_pulses is over the limit.

Construct IceMixNodes.

Parameters:
  • input_feature_names (Optional[List[str]], default: None) – Column names for input features. Minimum

  • names. (required features are z coordinate and hlc column)

  • max_pulses (int, default: 768) – Maximum number of pulses to keep in the event.

  • z_name (str, default: 'dom_z') – Name of the z-coordinate column.

  • hlc_name (Optional[str], default: 'hlc') – Name of the Hard Local Coincidence Check column.

  • add_ice_properties (bool, default: True) – If True, scattering and absoption length of

  • coordinate. (ice in IceCube are added to the feature set based on z)

  • ice_args (Dict[str, Optional[float]], default: {'z_offset': None, 'z_scaling': None}) – Offset and scaling of the z coordinate in the Detector,

  • data. (to be able to make similar conversion in the ice)

  • sample_pulses (bool, default: True) – Enable sampling random pulses. If True and the

  • max_length (event is longer than the)

  • If (they will be sampled.)

  • False

  • selected. (then only the first max_length pulses will be)

  • args (Any)

  • kwargs (Any)

Return type:

object

class graphnet.models.data_representation.graphs.nodes.nodes.ClusterSummaryFeatures(*args, **kwargs)[source]

Bases: NodeDefinition

Represent pulse maps as clusters with summary features.

If cluster_on is set to the xyz coordinates of optical modules e.g. cluster_on = [‘dom_x’, ‘dom_y’, ‘dom_z’], each node will be a unique optical module and the pulse information (e.g. charge, time) is summarized. NOTE: Developed to be used with features

[dom_x, dom_y, dom_z, charge, time]

Possible features per cluster: - total charge

feature name: total_charge

  • charge accumulated after <X> time units

    feature name: charge_after_<X>ns

  • time of first hit in the optical module

    feature name: time_of_first_hit

  • time spread per optical module

    feature name: time_spread

  • time std per optical module

    feature name: time_std

  • time took to collect <X> percent of total charge per cluster

    feature name: time_after_charge_pct<X>

  • number of pulses per clusters

    feature name: counts

For more details on most of the listed features see Theo Glauch’s thesis (chapter 5.3): https://mediatum.ub.tum.de/node?id=1584755

NOTE: The counts feature (number of pulses per cluster) is an

addition introduced in this implementation and is not part of the feature set described in the referenced thesis.

Construct ClusterSummaryFeatures.

Parameters:
  • cluster_on (List[str]) – Names of features to create clusters from.

  • input_feature_names (List[str]) – Column names for input features.

  • charge_label (str, default: 'charge') – Name of the charge column.

  • time_label (str, default: 'dom_time') – Name of the time column.

  • total_charge (bool, default: True) – If True, calculates total charge as feature.

  • charge_after_t (List[int], default: [10, 50, 100]) – List of times at which the accumulated charge is calculated as a feature.

  • time_of_first_hit (bool, default: True) – If True, time of first hit is added as a feature.

  • time_spread (bool, default: True) – If True, time spread is added as a feature.

  • time_std (bool, default: True) – If True, time std is added as a feature.

  • time_after_charge_pct (List[int], default: [1, 3, 5, 11, 15, 20, 50, 80]) – List of percentiles to calculate time after charge.

  • charge_standardization (Union[float, str], default: 'log') – Either a float or ‘log’. If a float, the features are multiplied by this factor. If ‘log’, the features are transformed to log10 scale.

  • time_standardization (float, default: 0.001) – Standardization factor for features with a time

  • order_in_time (bool, default: True) –

    If True, clusters are ordered in time.

    If your data is already ordered in time, you can set this to False to avoid a potential overhead.

    NOTE: Should only be set to False if you are sure that

    the input data is already ordered in time. Will lead to incorrect results otherwise.

  • add_counts (bool, default: False) – If True, number of log10(event counts per clusters) is added as a feature.

  • args (Any)

  • kwargs (Any)

Return type:

object

NOTE: Make sure that either the input data is not already standardized or that the charge_standardization and time_standardization parameters are set to 1 to avoid a double standardization.

set_indices(feature_names)[source]

Set the indices for the input features.

Return type:

None

Parameters:

feature_names (List[str])