High-level description
TheSpectralNeighborJoiningSolver
class in Cassiopeia implements a phylogenetic tree reconstruction algorithm based on the spectral neighbor-joining method. It iteratively joins pairs of nodes (or groups of nodes) based on the second singular value of a specific matrix derived from the pairwise similarities between nodes. This process continues until a complete tree is formed.
Code Structure
TheSpectralNeighborJoiningSolver
class inherits from the DistanceSolver
class. It overrides the find_cherry
and update_dissimilarity_map
methods to implement the spectral neighbor-joining logic. The _compute_svd2
method is a helper function used to calculate the second singular value of the relevant matrix.
References
This solver references thedissimilarity_functions
module for calculating pairwise similarities between nodes and the solver_utilities
module for helper functions like generating unique node names.
Symbols
SpectralNeighborJoiningSolver
Description
This class implements the spectral neighbor-joining algorithm for phylogenetic tree reconstruction. It uses a spectral method to identify pairs of nodes to join based on their pairwise similarities.Inputs
Name | Type | Description |
---|---|---|
similarity_function | Optional[Callable[[np.ndarray, np.ndarray, int, Dict[int, Dict[int, float]]], float]] | A function to compute the pairwise similarity between nodes. Defaults to exponential_negative_hamming_distance . |
add_root | bool | Whether to add an implicit root to the tree. Defaults to False . |
prior_transformation | str | The transformation to apply to priors when calculating weights. Defaults to "negative_log" . |
Outputs
The solver populates thetree
attribute of the input CassiopeiaTree
object with the reconstructed tree.
Internal Logic
The solver iteratively performs the following steps:- Calculate the dissimilarity map: This map stores the pairwise dissimilarities between all nodes.
- Find a cherry: Using the
find_cherry
method, the solver identifies the pair of nodes with the smallest second singular value of their corresponding RA matrix (see_compute_svd2
for details). - Update the dissimilarity map: The
update_dissimilarity_map
method updates the dissimilarity map to reflect the newly joined cherry. - Repeat steps 2-3 until only two nodes remain.
- Root the tree: If
add_root
isTrue
, the solver adds an implicit root to the tree.
get_dissimilarity_map
Description
This method obtains or generates the initial dissimilarity map used by the solver. It calls the parent class’sget_dissimilarity_map
method and then constructs the initial lambda matrix, where each subset is a singleton leaf node.
Inputs
Name | Type | Description |
---|---|---|
cassiopeia_tree | CassiopeiaTree | The CassiopeiaTree object passed into solve . |
layer | Optional[str] | Layer storing the character matrix for solving. If None, the default character matrix is used. |
Outputs
Name | Type | Description |
---|---|---|
lambda_matrix_df | pd.DataFrame | The initial lambda matrix, structured similarly to the dissimilarity_map in DistanceSolver . |
Internal Logic
- Obtain the similarity map: Calls the parent class’s
get_dissimilarity_map
method to get the pairwise similarities between nodes. - Prepare for the lambda matrix: Initializes the
_lambda_indices
attribute, which stores the subsets of nodes being considered. Initially, each subset is a singleton leaf node. - Generate the lambda matrix: Iterates through all pairs of subsets and calculates the second singular value of their corresponding RA matrix using the
_compute_svd2
method. - Convert to DataFrame: Converts the lambda matrix from a NumPy array to a Pandas DataFrame.
find_cherry
Description
This method finds the pair of nodes (or subsets of nodes) to join into a cherry based on the lambda matrix. The cherry is identified as the pair with the smallest second singular value in the lambda matrix.Inputs
Name | Type | Description |
---|---|---|
dissimilarity_map | np.ndarray | The lambda matrix. |
Outputs
Name | Type | Description |
---|---|---|
(i, j) | Tuple[int, int] | A tuple of indices representing the rows in the lambda matrix corresponding to the nodes to be joined. |
_compute_svd2
Description
This method calculates the second largest singular value of the RA matrix for a given pair of subsets of nodes. The RA matrix is constructed based on the pairwise similarities between nodes in the subsets and their complements.Inputs
Name | Type | Description |
---|---|---|
pair | Tuple[int, int] | A tuple of indices representing the rows in the lambda matrix corresponding to the two subsets of nodes. |
lambda_indices | List[List[int]] | The list of subsets of nodes being considered. |
Outputs
Name | Type | Description |
---|---|---|
svd2_val | float | The second largest singular value of the RA matrix for the given pair of subsets. |
Internal Logic
- Get the combined subset: Combines the two subsets of nodes specified by the
pair
argument. - Get the complement: Identifies the nodes that are not in the combined subset.
- Reconstruct the RA matrix: Constructs the RA matrix based on the pairwise similarities between nodes in the combined subset and its complement.
- Calculate the second largest singular value: Computes the singular value decomposition of the RA matrix and returns the second largest singular value.
update_dissimilarity_map
Description
This method updates the lambda matrix after a cherry is formed. It removes the rows and columns corresponding to the joined nodes and adds a new row and column for the newly formed cherry. The entries in the new row and column are calculated using the_compute_svd2
method.
Inputs
Name | Type | Description |
---|---|---|
similarity_map | pd.DataFrame | The lambda matrix to update. |
cherry | Tuple[str, str] | The names of the two nodes that were joined into a cherry. |
new_node | str | The name of the newly formed cherry node. |
Outputs
Name | Type | Description |
---|---|---|
lambda_matrix_df | pd.DataFrame | The updated lambda matrix. |
Internal Logic
- Identify cherry nodes: Finds the indices of the cherry nodes in the lambda matrix.
- Modify names and indices: Updates the node names and
_lambda_indices
attribute to reflect the newly formed cherry. - Extract the old lambda matrix: Removes the rows and columns corresponding to the joined nodes from the lambda matrix.
- Add new row and column: Adds a new row and column to the lambda matrix for the newly formed cherry.
- Compute new singular values: Calculates the second singular values for the new row and column using the
_compute_svd2
method. - Regenerate lambda matrix: Constructs a new DataFrame from the updated lambda matrix.
setup_root_finder
Description
This method defines the implicit rooting strategy for the solver. It adds an implicit root with all-zero character states to the character matrix and recomputes the dissimilarity map.Inputs
Name | Type | Description |
---|---|---|
cassiopeia_tree | CassiopeiaTree | The CassiopeiaTree object passed into solve . |
Internal Logic
- Add implicit root: Adds a new row to the character matrix representing the implicit root with all-zero character states.
- Recompute dissimilarity map: Computes the dissimilarity map using the updated character matrix.
root_tree
Description
This method roots the tree by adding an edge between the last two remaining nodes and converting the undirected graph to a directed graph with respect to the specified root sample.Inputs
Name | Type | Description |
---|---|---|
tree | nx.Graph | The undirected tree topology. |
root_sample | str | The name of the sample to treat as the root. |
remaining_samples | List[str] | The names of the last two unjoined nodes. |
Outputs
Name | Type | Description |
---|---|---|
rooted_tree | nx.DiGraph | The rooted tree. |
Dependencies
Dependency | Purpose |
---|---|
networkx | Representing and manipulating the tree topology. |
numpy | Numerical operations and array manipulation. |
pandas | Data manipulation and representation of the dissimilarity map. |
scipy | Singular value decomposition calculation. |
ete3 | Tree manipulation and visualization (used in _ccphylo_solve ). |
configparser | Reading configuration settings (used in _setup_ccphylo ). |
subprocess | Running external commands (used in _ccphylo_solve ). |
tempfile | Creating temporary files and directories (used in _ccphylo_solve ). |
Configuration
The_ccphylo_solve
method uses the ccphylo_path
setting from the config.ini
file to locate the CCPhylo executable.
Error Handling
TheDistanceSolverError
exception is raised for various error conditions, such as missing rooting parameters or invalid dissimilarity functions.