High-level description

The MaxCutSolver class is a subclass of GreedySolver that implements a graph-based algorithm for inferring phylogenetic trees. It aims to find an optimal partition of samples based on their shared and differing mutations, represented as a connectivity graph, by approximating the max-cut problem.

Code Structure

The MaxCutSolver class inherits from GreedySolver and overrides the perform_split method. It utilizes the construct_connectivity_graph, evaluate_cut, and max_cut_improve_cut functions from the graph_utilities module to construct and manipulate the connectivity graph.

References

This code references the following symbols from other files:

  • GreedySolver (cassiopeia/solver/GreedySolver.py)
  • graph_utilities (cassiopeia/solver/graph_utilities.py)

Symbols

MaxCutSolver

Description

This class implements the MaxCut algorithm for phylogenetic tree inference. It inherits from the GreedySolver class and provides a concrete implementation for the perform_split method.

Inputs

NameTypeDescription
sdimensionOptional[int]The number of dimensions for the embedding space (default: 3)
iterationsOptional[int]The number of iterations for updating the embeddings (default: 50)
prior_transformationstrThe transformation function to apply to priors (default: “negative_log”)

Outputs

This class doesn’t directly return any output. It modifies the input CassiopeiaTree object during the solve method.

Internal Logic

The perform_split method performs the following steps:

  1. Construct Connectivity Graph: It builds a connectivity graph where nodes represent samples and edge weights represent the difference between the number of triplets separating and grouping the samples based on shared mutations.
  2. Embed Samples: Randomly embed the samples in a d-dimensional sphere.
  3. Update Embeddings: Iteratively update the embeddings based on edge weights, pulling together nodes with strong negative connections and pushing apart nodes with positive connections.
  4. Find Max Cut: Generate random hyperplanes to bisect the d-sphere and select the one that maximizes the cut weight, effectively partitioning the samples.
  5. Improve Cut: Apply a greedy hill-climbing procedure to further optimize the cut by iteratively moving nodes between partitions to maximize the cut weight.

Performance Considerations

The performance of the algorithm is influenced by the sdimension and iterations parameters. Higher values may lead to better solutions but increase computational cost.

evaluate_cut

Description

This function calculates the weight of a cut in a graph.

Inputs

NameTypeDescription
cutList[str]A list of nodes representing one side of the cut
Gnx.DiGraphThe graph on which the cut is evaluated

Outputs

NameTypeDescription
cut_scorefloatThe total weight of edges crossing the cut

Internal Logic

The function iterates through each edge in the graph and checks if it crosses the cut (i.e., if one node is in the cut list and the other is not). If an edge crosses the cut, its weight is added to the cut_score.

Side Effects

This function has no side effects.