topology.py
Here’s a documentation for the target file cassiopeia/tools/topology.py
:
High-level description
This file contains utility functions to assess topological properties of phylogenetic trees, specifically focusing on balance and expansion. It provides methods to compute expansion p-values and cophenetic correlation for CassiopeiaTree objects.
Code Structure
The main functions in this file are compute_expansion_pvalues
and compute_cophenetic_correlation
. These functions operate on CassiopeiaTree objects and utilize various helper functions and statistical methods to analyze tree topology.
Symbols
compute_expansion_pvalues
Description
This function calculates expansion p-values for nodes in a phylogenetic tree, assessing the probability of observing a given subclade size under a neutral coalescent model.
Inputs
Name | Type | Description |
---|---|---|
tree | CassiopeiaTree | The phylogenetic tree to analyze |
min_clade_size | int | Minimum number of leaves in a subtree to be considered |
min_depth | int | Minimum depth of clade to be considered |
copy | bool | Whether to return a copy of the tree or modify in-place |
Outputs
Name | Type | Description |
---|---|---|
tree | CassiopeiaTree or None | The tree with added expansion p-values (if copy=True) or None |
Internal Logic
- Initializes attributes for each node in the tree
- Traverses the tree in depth-first order
- For each node, calculates the expansion p-value based on the number of leaves in its subtree
- Adds the calculated p-values as attributes to the tree nodes
Performance Considerations
The function runs in O(n log n) time for balanced trees, but can be up to O(n^3) for highly unbalanced trees.
compute_cophenetic_correlation
Description
Computes the cophenetic correlation of a lineage, which is the Pearson correlation between the phylogenetic distance and character dissimilarity.
Inputs
Name | Type | Description |
---|---|---|
tree | CassiopeiaTree | The phylogenetic tree to analyze |
weights | pd.DataFrame (optional) | Phylogenetic weights matrix |
dissimilarity_map | pd.DataFrame (optional) | Dissimilarity matrix between samples |
dissimilarity_function | Callable | Function to compute dissimilarities between samples |
Outputs
Name | Type | Description |
---|---|---|
correlation | float | The cophenetic correlation value |
significance | float | The significance of the correlation |
Internal Logic
- Computes or uses provided phylogenetic weight matrix and dissimilarity map
- Aligns matrices to ensure they correspond to the same set of leaves
- Converts matrices to condensed distance matrices
- Calculates Pearson correlation between phylogenetic distances and dissimilarities
Performance Considerations
If weights and dissimilarity map are not precomputed, the function runs in O(mn^2 + n^2logn + n^2) time, where n is the number of leaves and m is the number of characters.
simple_coalescent_probability
Description
Computes the probability of observing a given number of samples in a lineage under a simple coalescent model.
Inputs
Name | Type | Description |
---|---|---|
n | int | Number of leaves in subtree |
b | int | Number of leaves in one lineage |
k | int | Number of lineages |
Outputs
Name | Type | Description |
---|---|---|
probability | float | Probability of observing b leaves on one lineage in a tree of n total leaves |
nCk
Description
Computes the binomial coefficient (n choose k).
Inputs
Name | Type | Description |
---|---|---|
n | int | Total number of items |
k | int | Number of items to choose |
Outputs
Name | Type | Description |
---|---|---|
coefficient | float | The number of ways to choose k items from n |
Dependencies
- numpy
- pandas
- scipy
- cassiopeia.data
- cassiopeia.mixins
- cassiopeia.solver.dissimilarity_functions
Error Handling
The code uses custom exceptions from cassiopeia.mixins
to handle errors specific to CassiopeiaTree operations.