High-level description

The IIDExponentialBayesian class in cassiopeia/tools/branch_length_estimator/IIDExponentialBayesian.py implements a Bayesian method for estimating branch lengths in a phylogenetic tree, assuming a subsampled Birth Process for the phylogeny and independent, identically distributed (IID) exponential waiting times for mutations at each site. The estimator calculates the posterior mean branch lengths conditional on the observed tree topology and character data.

Code Structure

The IIDExponentialBayesian class inherits from the abstract base class BranchLengthEstimator and implements the estimate_branch_lengths method. The core logic resides in _populate_attributes_with_cpp_implementation, which leverages a C++ implementation (_PyInferPosteriorTimes) for efficient computation of posterior node times and related attributes.


  • cassiopeia.data.CassiopeiaTree: Used to represent and manipulate phylogenetic trees.
  • ._iid_exponential_bayesian._PyInferPosteriorTimes: C++ module for efficient posterior time inference.
  • .BranchLengthEstimator.BranchLengthEstimator: Abstract base class for branch length estimators.




This class implements the IID Exponential Bayesian branch length estimation method. It assumes a subsampled Birth Process for the phylogeny and IID exponential waiting times for mutations.


mutation_ratefloatThe CRISPR/Cas9 mutation rate.
birth_ratefloatThe phylogeny birth rate.
sampling_probabilityfloatThe probability that a leaf in the ground truth tree was sampled. Must be in (0, 1].
discretization_levelintHow many timesteps are used to discretize time. Defaults to 600.


The class doesn’t directly return outputs. Instead, it modifies the input CassiopeiaTree object in place, populating its branch lengths based on the estimated posterior mean times.

Internal Logic

  1. Input Validation: Ensures the input tree is valid (binary except for the root) and the sampling probability is within the acceptable range.
  2. Data Preprocessing: Imputes unambiguous missing states in the character matrix for easier dynamic programming.
  3. C++ Implementation: Calls the _PyInferPosteriorTimes C++ module to efficiently compute the log joint probabilities, posterior means, and posterior distributions of node times.
  4. Branch Length Population: Uses the computed posterior mean times to populate the branch lengths of the input CassiopeiaTree.

Side Effects

  • Modifies the input CassiopeiaTree object in place by populating its branch lengths.

Performance Considerations

  • The computational complexity of the branch length estimation is O(discretization_level * tree.n_cell * tree.n_character).
  • The C++ implementation (_PyInferPosteriorTimes) is used for performance optimization.



This method estimates the branch lengths of the provided CassiopeiaTree using the IID Exponential Bayesian model.


treeCassiopeiaTreeThe input CassiopeiaTree object for which to estimate branch lengths.


The method doesn’t directly return outputs. Instead, it modifies the input CassiopeiaTree object in place, populating its branch lengths.

Internal Logic

  1. Input Validation: Validates the input tree topology.
  2. Data Preprocessing: Creates a deepcopy of the input tree and imputes deducible missing states.
  3. Posterior Time Inference: Calls _populate_attributes_with_cpp_implementation to infer posterior node times using the C++ implementation.
  4. Branch Length Population: Calls _populate_branch_lengths to populate the branch lengths of the original input tree using the inferred posterior means.

Side Effects

  • Modifies the input CassiopeiaTree object in place by populating its branch lengths.



This method returns the log joint probability density of the observed tree topology, state vectors, and all possible times for a given node.


nodestrThe internal node for which to compute the log joint probabilities.


log_jointsnp.arrayAn array of log joint probabilities for each discretized time point.



This method returns the posterior distribution of the time for a given node, conditional on the observed character states and tree topology.


nodestrThe internal node for which to compute the posterior time distribution.


posterior_timenp.arrayAn array representing the posterior time distribution for the given node.


  • numpy
  • copy
  • typing
  • scipy (used in related code snippets)
  • networkx (used in related code snippets)
  • parameterized (used in related code snippets)
  • pytest (used in related code snippets)
  • unittest (used in related code snippets)

Error Handling

  • Raises ValueError for invalid input parameters or tree topology.


None found.