High-level description
TheSpectralNeighborJoiningSolver class in Cassiopeia implements a phylogenetic tree reconstruction algorithm based on the spectral neighbor-joining method. It iteratively joins pairs of nodes (or groups of nodes) based on the second singular value of a specific matrix derived from the pairwise similarities between nodes. This process continues until a complete tree is formed.
Code Structure
TheSpectralNeighborJoiningSolver class inherits from the DistanceSolver class. It overrides the find_cherry and update_dissimilarity_map methods to implement the spectral neighbor-joining logic. The _compute_svd2 method is a helper function used to calculate the second singular value of the relevant matrix.
References
This solver references thedissimilarity_functions module for calculating pairwise similarities between nodes and the solver_utilities module for helper functions like generating unique node names.
Symbols
SpectralNeighborJoiningSolver
Description
This class implements the spectral neighbor-joining algorithm for phylogenetic tree reconstruction. It uses a spectral method to identify pairs of nodes to join based on their pairwise similarities.Inputs
| Name | Type | Description | 
|---|---|---|
| similarity_function | Optional[Callable[[np.ndarray, np.ndarray, int, Dict[int, Dict[int, float]]], float]] | A function to compute the pairwise similarity between nodes. Defaults to exponential_negative_hamming_distance. | 
| add_root | bool | Whether to add an implicit root to the tree. Defaults to False. | 
| prior_transformation | str | The transformation to apply to priors when calculating weights. Defaults to "negative_log". | 
Outputs
The solver populates thetree attribute of the input CassiopeiaTree object with the reconstructed tree.
Internal Logic
The solver iteratively performs the following steps:- Calculate the dissimilarity map: This map stores the pairwise dissimilarities between all nodes.
 - Find a cherry: Using the 
find_cherrymethod, the solver identifies the pair of nodes with the smallest second singular value of their corresponding RA matrix (see_compute_svd2for details). - Update the dissimilarity map: The 
update_dissimilarity_mapmethod updates the dissimilarity map to reflect the newly joined cherry. - Repeat steps 2-3 until only two nodes remain.
 - Root the tree: If 
add_rootisTrue, the solver adds an implicit root to the tree. 
get_dissimilarity_map
Description
This method obtains or generates the initial dissimilarity map used by the solver. It calls the parent class’sget_dissimilarity_map method and then constructs the initial lambda matrix, where each subset is a singleton leaf node.
Inputs
| Name | Type | Description | 
|---|---|---|
| cassiopeia_tree | CassiopeiaTree | The CassiopeiaTree object passed into solve. | 
| layer | Optional[str] | Layer storing the character matrix for solving. If None, the default character matrix is used. | 
Outputs
| Name | Type | Description | 
|---|---|---|
| lambda_matrix_df | pd.DataFrame | The initial lambda matrix, structured similarly to the dissimilarity_map in DistanceSolver. | 
Internal Logic
- Obtain the similarity map: Calls the parent class’s 
get_dissimilarity_mapmethod to get the pairwise similarities between nodes. - Prepare for the lambda matrix: Initializes the 
_lambda_indicesattribute, which stores the subsets of nodes being considered. Initially, each subset is a singleton leaf node. - Generate the lambda matrix: Iterates through all pairs of subsets and calculates the second singular value of their corresponding RA matrix using the 
_compute_svd2method. - Convert to DataFrame: Converts the lambda matrix from a NumPy array to a Pandas DataFrame.
 
find_cherry
Description
This method finds the pair of nodes (or subsets of nodes) to join into a cherry based on the lambda matrix. The cherry is identified as the pair with the smallest second singular value in the lambda matrix.Inputs
| Name | Type | Description | 
|---|---|---|
| dissimilarity_map | np.ndarray | The lambda matrix. | 
Outputs
| Name | Type | Description | 
|---|---|---|
| (i, j) | Tuple[int, int] | A tuple of indices representing the rows in the lambda matrix corresponding to the nodes to be joined. | 
_compute_svd2
Description
This method calculates the second largest singular value of the RA matrix for a given pair of subsets of nodes. The RA matrix is constructed based on the pairwise similarities between nodes in the subsets and their complements.Inputs
| Name | Type | Description | 
|---|---|---|
| pair | Tuple[int, int] | A tuple of indices representing the rows in the lambda matrix corresponding to the two subsets of nodes. | 
| lambda_indices | List[List[int]] | The list of subsets of nodes being considered. | 
Outputs
| Name | Type | Description | 
|---|---|---|
| svd2_val | float | The second largest singular value of the RA matrix for the given pair of subsets. | 
Internal Logic
- Get the combined subset: Combines the two subsets of nodes specified by the 
pairargument. - Get the complement: Identifies the nodes that are not in the combined subset.
 - Reconstruct the RA matrix: Constructs the RA matrix based on the pairwise similarities between nodes in the combined subset and its complement.
 - Calculate the second largest singular value: Computes the singular value decomposition of the RA matrix and returns the second largest singular value.
 
update_dissimilarity_map
Description
This method updates the lambda matrix after a cherry is formed. It removes the rows and columns corresponding to the joined nodes and adds a new row and column for the newly formed cherry. The entries in the new row and column are calculated using the_compute_svd2 method.
Inputs
| Name | Type | Description | 
|---|---|---|
| similarity_map | pd.DataFrame | The lambda matrix to update. | 
| cherry | Tuple[str, str] | The names of the two nodes that were joined into a cherry. | 
| new_node | str | The name of the newly formed cherry node. | 
Outputs
| Name | Type | Description | 
|---|---|---|
| lambda_matrix_df | pd.DataFrame | The updated lambda matrix. | 
Internal Logic
- Identify cherry nodes: Finds the indices of the cherry nodes in the lambda matrix.
 - Modify names and indices: Updates the node names and 
_lambda_indicesattribute to reflect the newly formed cherry. - Extract the old lambda matrix: Removes the rows and columns corresponding to the joined nodes from the lambda matrix.
 - Add new row and column: Adds a new row and column to the lambda matrix for the newly formed cherry.
 - Compute new singular values: Calculates the second singular values for the new row and column using the 
_compute_svd2method. - Regenerate lambda matrix: Constructs a new DataFrame from the updated lambda matrix.
 
setup_root_finder
Description
This method defines the implicit rooting strategy for the solver. It adds an implicit root with all-zero character states to the character matrix and recomputes the dissimilarity map.Inputs
| Name | Type | Description | 
|---|---|---|
| cassiopeia_tree | CassiopeiaTree | The CassiopeiaTree object passed into solve. | 
Internal Logic
- Add implicit root: Adds a new row to the character matrix representing the implicit root with all-zero character states.
 - Recompute dissimilarity map: Computes the dissimilarity map using the updated character matrix.
 
root_tree
Description
This method roots the tree by adding an edge between the last two remaining nodes and converting the undirected graph to a directed graph with respect to the specified root sample.Inputs
| Name | Type | Description | 
|---|---|---|
| tree | nx.Graph | The undirected tree topology. | 
| root_sample | str | The name of the sample to treat as the root. | 
| remaining_samples | List[str] | The names of the last two unjoined nodes. | 
Outputs
| Name | Type | Description | 
|---|---|---|
| rooted_tree | nx.DiGraph | The rooted tree. | 
Dependencies
| Dependency | Purpose | 
|---|---|
| networkx | Representing and manipulating the tree topology. | 
| numpy | Numerical operations and array manipulation. | 
| pandas | Data manipulation and representation of the dissimilarity map. | 
| scipy | Singular value decomposition calculation. | 
| ete3 | Tree manipulation and visualization (used in _ccphylo_solve). | 
| configparser | Reading configuration settings (used in _setup_ccphylo). | 
| subprocess | Running external commands (used in _ccphylo_solve). | 
| tempfile | Creating temporary files and directories (used in _ccphylo_solve). | 
Configuration
The_ccphylo_solve method uses the ccphylo_path setting from the config.ini file to locate the CCPhylo executable.
Error Handling
TheDistanceSolverError exception is raised for various error conditions, such as missing rooting parameters or invalid dissimilarity functions.
