MaxCutSolver.py
High-level description
The MaxCutSolver
class is a subclass of GreedySolver
that implements a graph-based algorithm for inferring phylogenetic trees. It aims to find an optimal partition of samples based on their shared and differing mutations, represented as a connectivity graph, by approximating the max-cut problem.
Code Structure
The MaxCutSolver
class inherits from GreedySolver
and overrides the perform_split
method. It utilizes the construct_connectivity_graph
, evaluate_cut
, and max_cut_improve_cut
functions from the graph_utilities
module to construct and manipulate the connectivity graph.
References
This code references the following symbols from other files:
GreedySolver
(cassiopeia/solver/GreedySolver.py)graph_utilities
(cassiopeia/solver/graph_utilities.py)
Symbols
MaxCutSolver
Description
This class implements the MaxCut algorithm for phylogenetic tree inference. It inherits from the GreedySolver
class and provides a concrete implementation for the perform_split
method.
Inputs
Name | Type | Description |
---|---|---|
sdimension | Optional[int] | The number of dimensions for the embedding space (default: 3) |
iterations | Optional[int] | The number of iterations for updating the embeddings (default: 50) |
prior_transformation | str | The transformation function to apply to priors (default: “negative_log”) |
Outputs
This class doesn’t directly return any output. It modifies the input CassiopeiaTree
object during the solve
method.
Internal Logic
The perform_split
method performs the following steps:
- Construct Connectivity Graph: It builds a connectivity graph where nodes represent samples and edge weights represent the difference between the number of triplets separating and grouping the samples based on shared mutations.
- Embed Samples: Randomly embed the samples in a d-dimensional sphere.
- Update Embeddings: Iteratively update the embeddings based on edge weights, pulling together nodes with strong negative connections and pushing apart nodes with positive connections.
- Find Max Cut: Generate random hyperplanes to bisect the d-sphere and select the one that maximizes the cut weight, effectively partitioning the samples.
- Improve Cut: Apply a greedy hill-climbing procedure to further optimize the cut by iteratively moving nodes between partitions to maximize the cut weight.
Performance Considerations
The performance of the algorithm is influenced by the sdimension
and iterations
parameters. Higher values may lead to better solutions but increase computational cost.
evaluate_cut
Description
This function calculates the weight of a cut in a graph.
Inputs
Name | Type | Description |
---|---|---|
cut | List[str] | A list of nodes representing one side of the cut |
G | nx.DiGraph | The graph on which the cut is evaluated |
Outputs
Name | Type | Description |
---|---|---|
cut_score | float | The total weight of edges crossing the cut |
Internal Logic
The function iterates through each edge in the graph and checks if it crosses the cut (i.e., if one node is in the cut
list and the other is not). If an edge crosses the cut, its weight is added to the cut_score
.
Side Effects
This function has no side effects.