Here’s a documentation for the target file cassiopeia/tools/topology.py:

High-level description

This file contains utility functions to assess topological properties of phylogenetic trees, specifically focusing on balance and expansion. It provides methods to compute expansion p-values and cophenetic correlation for CassiopeiaTree objects.

Code Structure

The main functions in this file are compute_expansion_pvalues and compute_cophenetic_correlation. These functions operate on CassiopeiaTree objects and utilize various helper functions and statistical methods to analyze tree topology.

Symbols

compute_expansion_pvalues

Description

This function calculates expansion p-values for nodes in a phylogenetic tree, assessing the probability of observing a given subclade size under a neutral coalescent model.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe phylogenetic tree to analyze
min_clade_sizeintMinimum number of leaves in a subtree to be considered
min_depthintMinimum depth of clade to be considered
copyboolWhether to return a copy of the tree or modify in-place

Outputs

NameTypeDescription
treeCassiopeiaTree or NoneThe tree with added expansion p-values (if copy=True) or None

Internal Logic

  1. Initializes attributes for each node in the tree
  2. Traverses the tree in depth-first order
  3. For each node, calculates the expansion p-value based on the number of leaves in its subtree
  4. Adds the calculated p-values as attributes to the tree nodes

Performance Considerations

The function runs in O(n log n) time for balanced trees, but can be up to O(n^3) for highly unbalanced trees.

compute_cophenetic_correlation

Description

Computes the cophenetic correlation of a lineage, which is the Pearson correlation between the phylogenetic distance and character dissimilarity.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe phylogenetic tree to analyze
weightspd.DataFrame (optional)Phylogenetic weights matrix
dissimilarity_mappd.DataFrame (optional)Dissimilarity matrix between samples
dissimilarity_functionCallableFunction to compute dissimilarities between samples

Outputs

NameTypeDescription
correlationfloatThe cophenetic correlation value
significancefloatThe significance of the correlation

Internal Logic

  1. Computes or uses provided phylogenetic weight matrix and dissimilarity map
  2. Aligns matrices to ensure they correspond to the same set of leaves
  3. Converts matrices to condensed distance matrices
  4. Calculates Pearson correlation between phylogenetic distances and dissimilarities

Performance Considerations

If weights and dissimilarity map are not precomputed, the function runs in O(mn^2 + n^2logn + n^2) time, where n is the number of leaves and m is the number of characters.

simple_coalescent_probability

Description

Computes the probability of observing a given number of samples in a lineage under a simple coalescent model.

Inputs

NameTypeDescription
nintNumber of leaves in subtree
bintNumber of leaves in one lineage
kintNumber of lineages

Outputs

NameTypeDescription
probabilityfloatProbability of observing b leaves on one lineage in a tree of n total leaves

nCk

Description

Computes the binomial coefficient (n choose k).

Inputs

NameTypeDescription
nintTotal number of items
kintNumber of items to choose

Outputs

NameTypeDescription
coefficientfloatThe number of ways to choose k items from n

Dependencies

  • numpy
  • pandas
  • scipy
  • cassiopeia.data
  • cassiopeia.mixins
  • cassiopeia.solver.dissimilarity_functions

Error Handling

The code uses custom exceptions from cassiopeia.mixins to handle errors specific to CassiopeiaTree operations.