PercolationSolver.py - Stanza Demo

High-level description

The PercolationSolver class implements a top-down, percolation-based algorithm for reconstructing phylogenetic trees. It recursively partitions a set of samples based on their similarity in observed mutations, creating a tree structure where similar samples are grouped closer together.

Code Structure

The PercolationSolver class is initialized with a joining solver, prior transformation function, similarity function, and a threshold. The solve method orchestrates the tree reconstruction process by recursively partitioning the sample set using the percolate method. The percolate method constructs a similarity graph, removes edges based on similarity, and partitions the graph into clades. If more than two clades are formed, the joining_solver is used to cluster them. Finally, the __add_duplicates_to_tree method adds duplicate samples to the tree.

References

This code references the following symbols:

CassiopeiaSolver (from cassiopeia.solver)
dissimilarity_functions (from cassiopeia.solver)
solver_utilities (from cassiopeia.solver)
CassiopeiaTree (from cassiopeia.data)
data_utilities (from cassiopeia.data.utilities)

Symbols

`PercolationSolver`

Description

This class implements the Percolation algorithm for phylogenetic tree reconstruction. It takes a character matrix and optional priors as input and builds a tree by recursively partitioning the sample set based on their similarity in observed mutations.

Inputs

Name	Type	Description
joining_solver	CassiopeiaSolver.CassiopeiaSolver	The CassiopeiaSolver used to cluster groups of samples if percolation produces more than two groups.
prior_transformation	str	A string specifying the transformation to apply to the priors. Defaults to “negative_log”.
similarity_function	Optional[Callable[[np.array, np.array, int, Optional[Dict[int, Dict[int, float]]]], float]]	A function that calculates the similarity score between two samples. Defaults to `hamming_similarity_without_missing`.
threshold	Optional[int]	The minimum similarity threshold for adding an edge in the similarity graph. Defaults to 0.

Outputs

This class doesn’t directly return any output. It modifies the input CassiopeiaTree object to store the reconstructed tree.

Internal Logic

Initialization: The class is initialized with a joining solver, prior transformation function, similarity function, and a threshold.
Solve: The solve method orchestrates the tree reconstruction process.
- It first extracts the character matrix and transforms the priors into weights if provided.
- It then creates a NetworkX graph and recursively partitions the sample set using the percolate method.
- Finally, it adds duplicate samples to the tree and populates the input CassiopeiaTree object.
Percolate: The percolate method partitions a set of samples into two groups based on their similarity.
- It constructs a similarity graph with samples as nodes and edges weighted by their similarity.
- It then iteratively removes the minimum weighted edges until the graph is disconnected into multiple components.
- If more than two components are formed, it clusters them using the joining_solver and returns the two resulting groups.
Add Duplicates: The __add_duplicates_to_tree method adds duplicate samples to the tree by finding groups of samples with identical character states and placing them as sister nodes under a new internal node.

Side Effects

Modifies the input CassiopeiaTree object to store the reconstructed tree.

`percolate`

Description

Partitions a set of samples into two groups based on their similarity in the character matrix.

Inputs

Name	Type	Description
character_matrix	pd.DataFrame	The character matrix representing the mutation profiles of the samples.
samples	List[str]	A list of sample names to be partitioned.
priors	Optional[Dict[int, Dict[int, float]]]	A dictionary of prior probabilities for each character/state pair.
weights	Optional[Dict[int, Dict[int, float]]]	A dictionary of weights for each character/state pair, typically derived from the priors.
missing_state_indicator	int	The character representing missing data in the character matrix. Defaults to -1.

Outputs

Name	Type	Description
partition_named	Tuple[List[str], List[str]]	A tuple containing two lists of sample names, representing the left and right partitions.

Internal Logic

Similarity Graph: Constructs a weighted graph with samples as nodes and edges weighted by their similarity calculated using the provided similarity_function.
Percolation: Iteratively removes the minimum weighted edges until the graph is disconnected into multiple connected components.
Component Merging (if necessary): If more than two connected components are formed, it clusters them using the joining_solver based on the LCA of each component and returns the two resulting groups.
Return Partitions: Returns a tuple containing two lists of sample names, representing the left and right partitions.

`__add_duplicates_to_tree`

Description

Adds duplicate samples to the tree.

Inputs

Name	Type	Description
tree	nx.DiGraph	The tree to which duplicate samples need to be added.
character_matrix	pd.DataFrame	The character matrix representing the mutation profiles of the samples.
node_name_generator	Generator[str, None, None]	A generator object that produces unique node names.

Outputs

Name	Type	Description
tree	nx.DiGraph	The tree with duplicate samples added.

Internal Logic

Identify Duplicates: Identifies groups of samples with identical character states in the character matrix.
Add Duplicates: For each group of duplicates:
- Creates a new internal node as the parent of the duplicates.
- Connects the original sample and all its duplicates as children of this new internal node.
Return Tree: Returns the modified tree with the duplicate samples added.

NeighborJoiningSolver.py SharedMutationJoiningSolver.py

On this page

High-level description
Code Structure
References
Symbols
PercolationSolver
Description
Inputs
Outputs
Internal Logic
Side Effects
percolate
Description
Inputs
Outputs
Internal Logic
__add_duplicates_to_tree
Description
Inputs
Outputs
Internal Logic