IIDExponentialMLE.py
High-level description
The IIDExponentialMLE
class implements a Maximum Likelihood Estimator (MLE) for inferring branch lengths of a tree under the assumption of independent and identically distributed (IID) memoryless CRISPR/Cas9 mutations with an exponential waiting time. This estimator assumes a tree depth of 1 and utilizes convex optimization to find the mutation rate and branch lengths that maximize the likelihood of the observed character data.
Code Structure
The IIDExponentialMLE
class inherits from the abstract base class BranchLengthEstimator
and implements the estimate_branch_lengths
method. This method constructs and solves a convex optimization problem to estimate the branch lengths and mutation rate.
References
This code references the following key symbols:
CassiopeiaTree
: Used to represent the tree structure and character data.IIDExponentialMLEError
: A custom exception class for errors specific to this estimator.
Symbols
IIDExponentialMLE
Description
This class implements the IID Exponential MLE algorithm for estimating branch lengths in a phylogenetic tree. It assumes that mutations occur independently and identically at each site with an exponential waiting time.
Inputs
Name | Type | Description |
---|---|---|
minimum_branch_length | float | Minimum allowed branch length (default: 0.01) |
relative_mutation_rates | Optional[List[float]] | List of relative mutation rates for each character site (default: None, meaning all sites evolve at the same rate) |
verbose | bool | Whether to print verbose output during optimization (default: False) |
solver | str | Convex optimization solver to use (default: “SCS”, options: “ECOS”, “SCS”, “MOSEK”) |
Outputs
This class doesn’t directly return any outputs. Instead, it modifies the input CassiopeiaTree
object in place, updating the branch lengths and storing the estimated mutation rate and log-likelihood as attributes.
Internal Logic
- Data Validation: Checks for degenerate cases like no mutations or complete saturation in the character matrix.
- Problem Setup:
- Creates optimization variables for each node’s time.
- Defines constraints:
- Root node has time 0.
- Minimum branch length constraint.
- Ultrametric constraint for all leaves (except the reference leaf).
- Formulates the log-likelihood function based on the IID exponential mutation model.
- Optimization:
- Defines the objective function as maximizing the log-likelihood.
- Constructs and solves the convex optimization problem using the specified solver.
- Result Extraction:
- Extracts the estimated mutation rate from the scaling factor of the reference leaf’s time.
- Retrieves the log-likelihood value.
- Tree Update:
- Computes and sets the branch lengths based on the optimized node times.
- Updates the
CassiopeiaTree
object with the estimated branch lengths.
Side Effects
- Modifies the input
CassiopeiaTree
object in place by updating its branch lengths. - Sets the
mutation_rate
andlog_likelihood
attributes of theIIDExponentialMLE
object.
Performance Considerations
- The choice of solver can impact the performance of the optimization.
- The
minimum_branch_length
parameter can influence the convergence and runtime of the solver. - Specifying
relative_mutation_rates
can increase the complexity of the optimization problem.
Dependencies
Dependency | Purpose |
---|---|
cvxpy | Used for defining and solving the convex optimization problem. |
numpy | Used for numerical operations. |
Error Handling
- Raises
ValueError
for invalid input parameters or degenerate character matrices. - Raises
IIDExponentialMLEError
if the solver fails or produces unexpected results.