High-level description
Thecassiopeia/tools
directory contains a collection of utility functions and classes for analyzing phylogenetic trees and performing various calculations related to lineage tracing experiments. This toolset provides functionality for estimating parameters, computing evolutionary metrics, and performing statistical analyses on tree structures.
What does it do?
The tools in this directory perform several key functions:- Autocorrelation analysis: Computes Moran’s I statistic to measure spatial autocorrelation of numerical data associated with tree leaves.
- Branch length estimation: Implements Maximum Likelihood Estimation (MLE) and Bayesian approaches for estimating branch lengths in phylogenetic trees.
- Evolutionary coupling: Calculates how closely related different categories are based on their distribution across the tree structure.
- Fitness estimation: Estimates the fitness of nodes in a phylogenetic tree using methods like Lineage-Based Inference (LBI).
- Parameter estimation: Estimates mutation rates and missing data rates from tree structures and character matrices.
- Small parsimony analysis: Implements algorithms for ancestral state reconstruction and parsimony scoring.
- Topology analysis: Assesses topological properties of trees, such as balance and expansion, and computes metrics like cophenetic correlation.
- Tree metrics: Calculates various metrics on phylogenetic trees, including parsimony scores and likelihood under different evolutionary models.
Entry points
The main entry points for developers are:autocorrelation.py
:compute_morans_i
function for spatial autocorrelation analysis.branch_length_estimator/
:IIDExponentialMLE
andIIDExponentialBayesian
classes for branch length estimation.coupling.py
:compute_evolutionary_coupling
function for category relationship analysis.fitness_estimator/
:FitnessEstimator
abstract base class andLBIJungle
implementation.parameter_estimators.py
: Functions for estimating mutation rates and missing data rates.small_parsimony.py
: Functions for ancestral state reconstruction and parsimony scoring.topology.py
: Functions for computing expansion p-values and cophenetic correlation.tree_metrics.py
: Functions for calculating parsimony and likelihood scores on trees.
Dependencies
The tools in this directory rely on several external libraries:- numpy: For numerical computations and array operations.
- pandas: For data manipulation and analysis.
- scipy: For scientific computing and statistical functions.
- networkx: For graph operations and tree representation.
- ete3: For tree manipulation and visualization.
- cvxpy: Used in the MLE branch length estimator.
- Cython: Used for interfacing with C++ code in the Bayesian branch length estimator.
- tqdm: For progress bars in long-running computations.
cassiopeia.data
module for tree data structures and the cassiopeia.mixins
module for custom error classes.
Configuration
Most of the tools in this directory do not require explicit configuration files. Instead, they use parameter-based configuration through function arguments and class constructors. Key configuration options include:- Random seeds for reproducibility in stochastic processes.
- Thresholds and minimum values for various calculations (e.g., minimum clade size for expansion p-values).
- Options for handling missing data and ancestral state reconstruction.
- Choices between discrete and continuous models for certain calculations.