GreedySolver.py
High-level description
The GreedySolver
class in Cassiopeia is an abstract base class for implementing top-down greedy algorithms for phylogenetic tree reconstruction. It provides a framework for recursively splitting a set of samples based on a split criterion, building the tree from the root down to the leaves. Subclasses of GreedySolver
implement the specific logic for performing the split.
Code Structure
The GreedySolver
class defines the overall structure of a greedy solver, including methods for performing splits, computing mutation frequencies, and adding duplicate samples to the tree. The perform_split
method is an abstract method that must be implemented by subclasses to define the specific split criterion.
References
CassiopeiaSolver
: TheGreedySolver
class inherits from theCassiopeiaSolver
abstract base class.solver_utilities
: TheGreedySolver
class uses utility functions from thesolver_utilities
module, such asnode_name_generator
andtransform_priors
.missing_data_methods
: TheGreedySolver
class uses functions from themissing_data_methods
module to handle missing data during character splits.find_duplicate_groups
: TheGreedySolver
class uses thefind_duplicate_groups
function from thecassiopeia.mixins
module to identify duplicate samples.
Symbols
GreedySolver
Description
This class represents a greedy solver for phylogenetic tree reconstruction. It provides a framework for recursively splitting a set of samples based on a split criterion, building the tree from the root down to the leaves.
Inputs
Name | Type | Description |
---|---|---|
prior_transformation | str | Function to use when transforming priors into weights. Defaults to “negative_log”. |
Outputs
None
Internal Logic
- Transform the priors into weights using the specified
prior_transformation
. - Recursively split the set of samples using the
perform_split
method: a. Create an ancestral node for each split. b. Place the resulting partitions as daughter clades. - Continue splitting until each partition contains only a single sample.
- If a split cannot be performed, create a polytomy.
- Add duplicate samples to the tree.
- Collapse mutationless edges if specified.
perform_split
Description
This abstract method defines the logic for partitioning a set of samples. It must be implemented by subclasses of GreedySolver
.
Inputs
Name | Type | Description |
---|---|---|
character_matrix | pd.DataFrame | Character matrix |
samples | List[int] | A list of samples to partition |
weights | Optional[Dict[int, Dict[int, float]]] | Weighting of each (character, state) pair. Typically a transformation of the priors. |
missing_state_indicator | int | Character representing missing data. |
Outputs
Name | Type | Description |
---|---|---|
left_partition | List[Union[int, str]] | The left partition group |
right_partition | List[Union[int, str]] | The right partition group |
compute_mutation_frequencies
Description
This method computes the frequencies of each character/state pair in a given set of samples.
Inputs
Name | Type | Description |
---|---|---|
samples | List[str] | The set of samples to consider |
unique_character_matrix | pd.DataFrame | The character matrix |
missing_state_indicator | int | Character representing missing data. |
Outputs
Name | Type | Description |
---|---|---|
freq_dict | Dict[int, Dict[int, int]] | A dictionary mapping each character to a dictionary of state/sample frequency pairs |
Internal Logic
- Iterate over each character in the character matrix.
- For each character, count the occurrences of each state for the given samples.
- Store the frequencies in a dictionary:
- Keys are characters
- Values are dictionaries mapping states to their frequencies
__add_duplicates_to_tree
Description
This private method adds duplicate samples to the tree.
Inputs
Name | Type | Description |
---|---|---|
tree | nx.DiGraph | The tree to add duplicates to |
character_matrix | pd.DataFrame | Character matrix |
node_name_generator | Generator[str, None, None] | Generator for creating unique node names |
Outputs
Name | Type | Description |
---|---|---|
tree | nx.DiGraph | The tree with duplicates added |
Internal Logic
- Identify duplicate samples using the
find_duplicate_groups
function. - For each group of duplicates: a. Create a new internal node. b. Connect the new node to the original sample. c. Connect the new node to all duplicates of the original sample.
Side Effects
The solve
method modifies the input CassiopeiaTree
object by populating its tree attribute.
Dependencies
networkx
: Used for representing and manipulating the tree.numpy
: Used for numerical operations.pandas
: Used for representing the character matrix.