MaxCutGreedySolver.py

High-level description

The MaxCutGreedySolver class, inheriting from GreedySolver, implements a phylogenetic tree reconstruction algorithm that combines a greedy approach with a max-cut optimization. It recursively partitions samples based on the most frequent mutation and then refines the split using a hill-climbing method to maximize the weight of edges cut in a connectivity graph built from shared mutations.

Code Structure

The MaxCutGreedySolver class overrides the perform_split method of its parent class, GreedySolver. It leverages the compute_mutation_frequencies method from the parent class and utilizes functions like construct_connectivity_graph and max_cut_improve_cut from the graph_utilities module.

References

This code references the following symbols from other files:

GreedySolver (from cassiopeia/solver/GreedySolver.py)
missing_data_methods (from cassiopeia/solver/missing_data_methods.py)
solver_utilities (from cassiopeia/solver/solver_utilities.py)
graph_utilities (from cassiopeia/solver/graph_utilities.py)

Symbols

`MaxCutGreedySolver`

Description

This class implements the MaxCut Greedy algorithm for phylogenetic tree reconstruction. It extends the basic GreedySolver by adding a max-cut optimization step to each split.

Inputs

Name	Type	Description
missing_data_classifier	Callable	Function to handle missing data during character splits (default: `missing_data_methods.assign_missing_average`).
prior_transformation	str	Transformation function for converting priors to weights (default: `"negative_log"`).

Outputs

This class doesn’t directly return any output. It populates a CassiopeiaTree object during the solve method (inherited from GreedySolver).

Internal Logic

The core logic resides in the perform_split method:

Identify Splitting Character/State: It identifies the most frequent character/state pair, considering potential weights derived from priors.
Initial Split: Samples are initially partitioned based on the chosen character/state.
Handle Missing Data: The missing_data_classifier function is used to assign samples with missing data to either side of the partition.
Construct Connectivity Graph: A connectivity graph is built where nodes represent samples and edge weights reflect the similarity based on shared mutations.
Max-Cut Optimization: The max_cut_improve_cut function refines the initial split by iteratively moving nodes between partitions to maximize the weight of edges cut, effectively grouping samples with similar mutations.
Return Refined Split: The optimized left and right partitions are returned.

`perform_split`

Description

This method partitions a set of samples based on the most frequent mutation and optimizes the split using the max-cut criterion.

Inputs

Name	Type	Description
character_matrix	pd.DataFrame	Character matrix containing character states for each sample.
samples	List[int]	List of samples to be partitioned.
weights	Optional[Dict[int, Dict[int, float]]]	Weights for each (character, state) pair, usually derived from priors.
missing_state_indicator	int	Character representing missing data (default: -1).

Outputs

Name	Type	Description
improved_left_set	List[str]	List of samples in the left partition after optimization.
improved_right_set	List[str]	List of samples in the right partition after optimization.

Internal Logic

See the ‘Internal Logic’ section of the MaxCutGreedySolver class description.

Side Effects

The perform_split method modifies the input left_set and right_set lists in-place during the missing data imputation and max-cut optimization steps.

Dependencies

This code depends on the following external libraries:

Dependency	Purpose
pandas	Handling and manipulating character matrices.
networkx	Constructing and manipulating graphs for representing relationships between samples.
numpy	Numerical operations, particularly in calculating scores and frequencies.

Error Handling

This code does not implement specific error handling beyond what is inherited from the parent class GreedySolver.

On this page

High-level description
Code Structure
References
Symbols
MaxCutGreedySolver
Description
Inputs
Outputs
Internal Logic
perform_split
Description
Inputs
Outputs
Internal Logic
Side Effects
Dependencies
Error Handling

High-level description

Code Structure

References

This code references the following symbols from other files:

GreedySolver (from cassiopeia/solver/GreedySolver.py)
missing_data_methods (from cassiopeia/solver/missing_data_methods.py)
solver_utilities (from cassiopeia/solver/solver_utilities.py)
graph_utilities (from cassiopeia/solver/graph_utilities.py)

Symbols

`MaxCutGreedySolver`

Description

This class implements the MaxCut Greedy algorithm for phylogenetic tree reconstruction. It extends the basic GreedySolver by adding a max-cut optimization step to each split.

Inputs

Name	Type	Description
missing_data_classifier	Callable	Function to handle missing data during character splits (default: `missing_data_methods.assign_missing_average`).
prior_transformation	str	Transformation function for converting priors to weights (default: `"negative_log"`).

Outputs

This class doesn’t directly return any output. It populates a CassiopeiaTree object during the solve method (inherited from GreedySolver).

Internal Logic

The core logic resides in the perform_split method:

Identify Splitting Character/State: It identifies the most frequent character/state pair, considering potential weights derived from priors.
Initial Split: Samples are initially partitioned based on the chosen character/state.
Handle Missing Data: The missing_data_classifier function is used to assign samples with missing data to either side of the partition.
Construct Connectivity Graph: A connectivity graph is built where nodes represent samples and edge weights reflect the similarity based on shared mutations.
Max-Cut Optimization: The max_cut_improve_cut function refines the initial split by iteratively moving nodes between partitions to maximize the weight of edges cut, effectively grouping samples with similar mutations.
Return Refined Split: The optimized left and right partitions are returned.

`perform_split`

Description

This method partitions a set of samples based on the most frequent mutation and optimizes the split using the max-cut criterion.

Inputs

Name	Type	Description
character_matrix	pd.DataFrame	Character matrix containing character states for each sample.
samples	List[int]	List of samples to be partitioned.
weights	Optional[Dict[int, Dict[int, float]]]	Weights for each (character, state) pair, usually derived from priors.
missing_state_indicator	int	Character representing missing data (default: -1).

Outputs

Name	Type	Description
improved_left_set	List[str]	List of samples in the left partition after optimization.
improved_right_set	List[str]	List of samples in the right partition after optimization.

Internal Logic

See the ‘Internal Logic’ section of the MaxCutGreedySolver class description.

Side Effects

The perform_split method modifies the input left_set and right_set lists in-place during the missing data imputation and max-cut optimization steps.

Dependencies

This code depends on the following external libraries:

Dependency	Purpose
pandas	Handling and manipulating character matrices.
networkx	Constructing and manipulating graphs for representing relationships between samples.
numpy	Numerical operations, particularly in calculating scores and frequencies.

Error Handling

This code does not implement specific error handling beyond what is inherited from the parent class GreedySolver.

On this page

High-level description
Code Structure
References
Symbols
MaxCutGreedySolver
Description
Inputs
Outputs
Internal Logic
perform_split
Description
Inputs
Outputs
Internal Logic
Side Effects
Dependencies
Error Handling

​High-level description

​Code Structure

​References

​Symbols

​MaxCutGreedySolver

​Description

​Inputs

​Outputs

​Internal Logic

​perform_split

​Description

​Inputs

​Outputs

​Internal Logic

​Side Effects

​Dependencies

​Error Handling

Project Root

​High-level description

​Code Structure

​References

​Symbols

​MaxCutGreedySolver

​Description

​Inputs

​Outputs

​Internal Logic

​perform_split

​Description

​Inputs

​Outputs

​Internal Logic

​Side Effects

​Dependencies

​Error Handling

High-level description

Code Structure

References

Symbols

`MaxCutGreedySolver`

Description

Inputs

Outputs

Internal Logic

`perform_split`

Description

Inputs

Outputs

Internal Logic

Side Effects

Dependencies

Error Handling

High-level description

Code Structure

References

Symbols

`MaxCutGreedySolver`

Description

Inputs

Outputs

Internal Logic

`perform_split`

Description

Inputs

Outputs

Internal Logic

Side Effects

Dependencies

Error Handling