High-level description

This file defines the top-level module for branch length estimation in the Cassiopeia library. It imports two specific estimators: IIDExponentialMLE and IIDExponentialBayesian.

Code Structure

This module simply imports two classes from submodules, making them directly accessible through the cassiopeia.tools.branch_length_estimator namespace.

Symbols

Symbol Name: IIDExponentialMLE

Description

This class implements a Maximum Likelihood Estimator (MLE) for branch lengths in a phylogenetic tree, assuming that CRISPR/Cas9 mutations occur independently and identically at each site with an exponential waiting time. It uses convex optimization to find the branch lengths that maximize the likelihood of the observed character data.

Inputs

This class doesn’t have any methods with explicit inputs documented in the code.

Outputs

This class doesn’t have any methods with explicit outputs documented in the code.

Internal Logic

The class utilizes the cvxpy library to formulate and solve the optimization problem. It constructs constraints based on the tree topology, minimum branch length, and ultrametricity (if applicable). The objective function is the log-likelihood of the observed character data given the branch lengths and mutation rates.

Symbol Name: IIDExponentialBayesian

Description

This class implements a Bayesian approach to estimate branch lengths in a phylogenetic tree. It assumes a subsampled birth process for the phylogeny and independent, identically distributed exponential waiting times for mutations at each site. The estimator calculates the posterior mean branch lengths conditional on the observed tree topology and character data.

Inputs

This class doesn’t have any methods with explicit inputs documented in the code.

Outputs

This class doesn’t have any methods with explicit outputs documented in the code.

Internal Logic

The class leverages a C++ implementation (_PyInferPosteriorTimes) to efficiently compute the posterior node times. It preprocesses the tree data, mapping nodes to integer IDs, and extracts relevant information like the number of mutated characters at each node. After running the C++ implementation, it retrieves the log-likelihood, log joint probabilities, posterior probabilities, and posterior mean times for each node.