GreedySolver.py - Stanza Demo

High-level description

The GreedySolver class in Cassiopeia is an abstract base class for implementing top-down greedy algorithms for phylogenetic tree reconstruction. It provides a framework for recursively splitting a set of samples based on a split criterion, building the tree from the root down to the leaves. Subclasses of GreedySolver implement the specific logic for performing the split.

Code Structure

The GreedySolver class defines the overall structure of a greedy solver, including methods for performing splits, computing mutation frequencies, and adding duplicate samples to the tree. The perform_split method is an abstract method that must be implemented by subclasses to define the specific split criterion.

References

CassiopeiaSolver: The GreedySolver class inherits from the CassiopeiaSolver abstract base class.
solver_utilities: The GreedySolver class uses utility functions from the solver_utilities module, such as node_name_generator and transform_priors.
missing_data_methods: The GreedySolver class uses functions from the missing_data_methods module to handle missing data during character splits.
find_duplicate_groups: The GreedySolver class uses the find_duplicate_groups function from the cassiopeia.mixins module to identify duplicate samples.

Symbols

`GreedySolver`

Description

This class represents a greedy solver for phylogenetic tree reconstruction. It provides a framework for recursively splitting a set of samples based on a split criterion, building the tree from the root down to the leaves.

Inputs

Name	Type	Description
prior_transformation	str	Function to use when transforming priors into weights. Defaults to “negative_log”.

Outputs

None

Internal Logic

Transform the priors into weights using the specified prior_transformation.
Recursively split the set of samples using the perform_split method: a. Create an ancestral node for each split. b. Place the resulting partitions as daughter clades.
Continue splitting until each partition contains only a single sample.
If a split cannot be performed, create a polytomy.
Add duplicate samples to the tree.
Collapse mutationless edges if specified.

`perform_split`

Description

This abstract method defines the logic for partitioning a set of samples. It must be implemented by subclasses of GreedySolver.

Inputs

Name	Type	Description
character_matrix	pd.DataFrame	Character matrix
samples	List[int]	A list of samples to partition
weights	Optional[Dict[int, Dict[int, float]]]	Weighting of each (character, state) pair. Typically a transformation of the priors.
missing_state_indicator	int	Character representing missing data.

Outputs

Name	Type	Description
left_partition	List[Union[int, str]]	The left partition group
right_partition	List[Union[int, str]]	The right partition group

`compute_mutation_frequencies`

Description

This method computes the frequencies of each character/state pair in a given set of samples.

Inputs

Name	Type	Description
samples	List[str]	The set of samples to consider
unique_character_matrix	pd.DataFrame	The character matrix
missing_state_indicator	int	Character representing missing data.

Outputs

Name	Type	Description
freq_dict	Dict[int, Dict[int, int]]	A dictionary mapping each character to a dictionary of state/sample frequency pairs

Internal Logic

Iterate over each character in the character matrix.
For each character, count the occurrences of each state for the given samples.
Store the frequencies in a dictionary:
- Keys are characters
- Values are dictionaries mapping states to their frequencies

`__add_duplicates_to_tree`

Description

This private method adds duplicate samples to the tree.

Inputs

Name	Type	Description
tree	nx.DiGraph	The tree to add duplicates to
character_matrix	pd.DataFrame	Character matrix
node_name_generator	Generator[str, None, None]	Generator for creating unique node names

Outputs

Name	Type	Description
tree	nx.DiGraph	The tree with duplicates added

Internal Logic

Identify duplicate samples using the find_duplicate_groups function.
For each group of duplicates: a. Create a new internal node. b. Connect the new node to the original sample. c. Connect the new node to all duplicates of the original sample.

Side Effects

The solve method modifies the input CassiopeiaTree object by populating its tree attribute.

Dependencies

networkx: Used for representing and manipulating the tree.
numpy: Used for numerical operations.
pandas: Used for representing the character matrix.

Project Root

​High-level description

​Code Structure

​References

​Symbols

​GreedySolver

​Description

​Inputs

​Outputs

​Internal Logic

​perform_split

​Description

​Inputs

​Outputs

​compute_mutation_frequencies

​Description

​Inputs

​Outputs

​Internal Logic

​__add_duplicates_to_tree

​Description

​Inputs

​Outputs

​Internal Logic

​Side Effects

​Dependencies

High-level description

Code Structure

References

Symbols

`GreedySolver`

Description

Inputs

Outputs

Internal Logic

`perform_split`

Description

Inputs

Outputs

`compute_mutation_frequencies`

Description

Inputs

Outputs

Internal Logic

`__add_duplicates_to_tree`

Description

Inputs

Outputs

Internal Logic

Side Effects

Dependencies