High-level description
This directory contains implementations of branch length estimation algorithms for phylogenetic trees in the Cassiopeia framework. It includes two main estimators: IIDExponentialMLE (Maximum Likelihood Estimator) and IIDExponentialBayesian (Bayesian estimator). These estimators are designed to infer branch lengths in phylogenetic trees under the assumption of independent and identically distributed (IID) exponential waiting times for mutations.What does it do?
The branch length estimators in this directory perform the following tasks:- Estimate branch lengths of a given phylogenetic tree based on observed character data.
- Implement different statistical approaches (Maximum Likelihood and Bayesian) to infer branch lengths.
- Assume an IID exponential model for mutation events in the tree.
- Handle various input parameters such as mutation rates, birth rates, and sampling probabilities.
- Provide methods to access estimated branch lengths, posterior distributions, and log-likelihood values.
Entry points
The main entry points for this directory are:IIDExponentialMLE
: Implements a Maximum Likelihood Estimator for branch lengths.IIDExponentialBayesian
: Implements a Bayesian estimator for branch lengths.
BranchLengthEstimator
and implement the estimate_branch_lengths
method. This method takes a CassiopeiaTree
object as input and modifies it in-place by updating its branch lengths.
The workflow typically involves:
- Creating an instance of either
IIDExponentialMLE
orIIDExponentialBayesian
with appropriate parameters. - Calling the
estimate_branch_lengths
method on aCassiopeiaTree
object. - Accessing the estimated branch lengths and other relevant information from the modified tree or the estimator object.
Key Files
BranchLengthEstimator.py
: Defines the abstract base classBranchLengthEstimator
.IIDExponentialMLE.py
: Implements the Maximum Likelihood Estimator.IIDExponentialBayesian.py
: Implements the Bayesian estimator._iid_exponential_bayesian_cpp.cpp
and_iid_exponential_bayesian_cpp.h
: C++ implementation of the core Bayesian inference algorithm for improved performance.
Dependencies
The main dependencies for this directory include:- numpy: Used for numerical operations and array manipulations.
- cvxpy: Used in
IIDExponentialMLE
for defining and solving the convex optimization problem. - scipy: Used in related code snippets for scientific computing.
- networkx: Used in related code snippets for graph operations.
Configuration
The estimators can be configured through various parameters: ForIIDExponentialMLE
:
minimum_branch_length
: Minimum allowed branch length (default: 0.01)relative_mutation_rates
: List of relative mutation rates for each character site (optional)verbose
: Whether to print verbose output during optimization (default: False)solver
: Convex optimization solver to use (default: “SCS”, options: “ECOS”, “SCS”, “MOSEK”)
IIDExponentialBayesian
:
mutation_rate
: The CRISPR/Cas9 mutation ratebirth_rate
: The phylogeny birth ratesampling_probability
: The probability that a leaf in the ground truth tree was sampled (must be in (0, 1])discretization_level
: Number of timesteps used to discretize time (default: 600)