High-level description

This code implements a Bayesian inference algorithm to estimate branch lengths in a phylogenetic tree, assuming an IID exponential model of character evolution. It uses dynamic programming to efficiently calculate the posterior probabilities of branch lengths given observed character data.

Code Structure

The code defines a class _InferPosteriorTimes that encapsulates the branch length estimation logic. The main components of this class are:

  • run(): This method orchestrates the entire inference process, taking input data and model parameters, performing calculations, and storing the results.
  • down(): This recursive function calculates the “down” probability, representing the likelihood of observing the data at and below a given node in the tree.
  • up(): This recursive function calculates the “up” probability, representing the likelihood of observing the data above a given node and the node dividing at a specific time.
  • compute_log_joint(): This function calculates the log joint probability of the data and a specific branch length.
  • populate_*_res(): These functions extract and format the calculated results for later access.

The main() function is currently empty and serves no purpose in this file.

References

This code does not reference other code symbols that require further explanation.

Symbols

_InferPosteriorTimes

Description

This class implements the Bayesian inference algorithm for estimating branch lengths in a phylogenetic tree.

Inputs

Refer to the run() method’s parameters in the _iid_exponential_bayesian_cpp.h header file for a detailed description of the input data and model parameters.

Outputs

The class provides several getter methods to access the calculated results:

  • get_posterior_means_res(): Returns a vector of pairs, where each pair contains a node index and its estimated posterior mean branch length.
  • get_posteriors_res(): Returns a vector of pairs, where each pair contains a node index and a vector of its posterior probabilities for each time point.
  • get_log_joints_res(): Returns a vector of pairs, where each pair contains a node index and a vector of its log joint probabilities for each time point.
  • get_log_likelihood_res(): Returns the overall log-likelihood of the data given the model.

Internal Logic

The core logic of the algorithm lies within the down() and up() functions, which recursively calculate the “down” and “up” probabilities using dynamic programming. These probabilities are then combined to compute the log joint probabilities and subsequently the posterior probabilities of branch lengths. The algorithm also precomputes the probability of lineages not being sampled (p_unsampled) to account for incomplete lineage information.

Performance Considerations

The code utilizes dynamic programming with memoization (down_cache and up_cache) to avoid redundant computations, significantly improving performance. However, the memory usage scales with the size of the tree, the number of discretization time points (T), and the number of characters (K).

Side Effects

This code has no notable side effects.

Dependencies

This code uses the standard C++ library.

Error Handling

The code includes basic error handling for memory allocation failures and invalid input parameters. It throws std::invalid_argument exceptions in such cases.

Logging

This code does not implement any logging mechanisms.