High-level description

The IIDExponentialMLE class implements a Maximum Likelihood Estimator (MLE) for inferring branch lengths of a tree under the assumption of independent and identically distributed (IID) memoryless CRISPR/Cas9 mutations with an exponential waiting time. This estimator assumes a tree depth of 1 and utilizes convex optimization to find the mutation rate and branch lengths that maximize the likelihood of the observed character data.

Code Structure

The IIDExponentialMLE class inherits from the abstract base class BranchLengthEstimator and implements the estimate_branch_lengths method. This method constructs and solves a convex optimization problem to estimate the branch lengths and mutation rate.

References

This code references the following key symbols:

  • CassiopeiaTree: Used to represent the tree structure and character data.
  • IIDExponentialMLEError: A custom exception class for errors specific to this estimator.

Symbols

IIDExponentialMLE

Description

This class implements the IID Exponential MLE algorithm for estimating branch lengths in a phylogenetic tree. It assumes that mutations occur independently and identically at each site with an exponential waiting time.

Inputs

NameTypeDescription
minimum_branch_lengthfloatMinimum allowed branch length (default: 0.01)
relative_mutation_ratesOptional[List[float]]List of relative mutation rates for each character site (default: None, meaning all sites evolve at the same rate)
verboseboolWhether to print verbose output during optimization (default: False)
solverstrConvex optimization solver to use (default: “SCS”, options: “ECOS”, “SCS”, “MOSEK”)

Outputs

This class doesn’t directly return any outputs. Instead, it modifies the input CassiopeiaTree object in place, updating the branch lengths and storing the estimated mutation rate and log-likelihood as attributes.

Internal Logic

  1. Data Validation: Checks for degenerate cases like no mutations or complete saturation in the character matrix.
  2. Problem Setup:
    • Creates optimization variables for each node’s time.
    • Defines constraints:
      • Root node has time 0.
      • Minimum branch length constraint.
      • Ultrametric constraint for all leaves (except the reference leaf).
    • Formulates the log-likelihood function based on the IID exponential mutation model.
  3. Optimization:
    • Defines the objective function as maximizing the log-likelihood.
    • Constructs and solves the convex optimization problem using the specified solver.
  4. Result Extraction:
    • Extracts the estimated mutation rate from the scaling factor of the reference leaf’s time.
    • Retrieves the log-likelihood value.
  5. Tree Update:
    • Computes and sets the branch lengths based on the optimized node times.
    • Updates the CassiopeiaTree object with the estimated branch lengths.

Side Effects

  • Modifies the input CassiopeiaTree object in place by updating its branch lengths.
  • Sets the mutation_rate and log_likelihood attributes of the IIDExponentialMLE object.

Performance Considerations

  • The choice of solver can impact the performance of the optimization.
  • The minimum_branch_length parameter can influence the convergence and runtime of the solver.
  • Specifying relative_mutation_rates can increase the complexity of the optimization problem.

Dependencies

DependencyPurpose
cvxpyUsed for defining and solving the convex optimization problem.
numpyUsed for numerical operations.

Error Handling

  • Raises ValueError for invalid input parameters or degenerate character matrices.
  • Raises IIDExponentialMLEError if the solver fails or produces unexpected results.