High-level description

This file contains unit tests for the IIDExponentialBayesian class, which is a branch length estimator in the Cassiopeia library. The tests primarily focus on comparing the estimator’s output with numerically computed values for log likelihood, log joints, and posterior time distributions.

Code Structure

The code defines several helper functions for numerical calculations and error handling, followed by a TestIIDExponentialBayesian class containing multiple test methods. Each test method focuses on a specific aspect of the IIDExponentialBayesian class, such as comparing its output to closed-form solutions or numerical approximations.

References

This code references the IIDExponentialBayesian class from cassiopeia.tools and the CassiopeiaTree class from cassiopeia.data.

Symbols

TestIIDExponentialBayesian

Description

A class containing unit tests for the IIDExponentialBayesian branch length estimator.

Inputs

This class does not take any inputs.

Outputs

This class does not return any outputs.

Internal Logic

The class contains several test methods, each designed to validate different aspects of the IIDExponentialBayesian class:

  • test_against_closed_form_solution_small: This test compares the estimator’s output to closed-form solutions for a small tree with one internal node. It tests various hyperparameter settings and validates the log likelihood, log joints, posterior time distribution, and posterior mean.

  • test_against_closed_form_solution_medium: Similar to the previous test, but uses a slightly larger tree with two internal nodes. This test is marked as slow and only runs with one set of hyperparameters.

  • test_small_discretization_level_raises_error: This test ensures that the estimator raises a ValueError when the discretization level is too small.

  • test_invalid_tree_topology_raises_error: This test checks that the estimator raises a ValueError for invalid tree topologies, such as trees that are not rooted or have internal nodes with degrees other than 3.

  • test_invalid_sampling_probability_raises_error: This test ensures that the estimator raises a ValueError for invalid sampling probabilities (outside the range (0, 1]).

Side Effects

This class does not have any side effects.


calc_exact_log_full_joint

Description

Calculates the exact log full joint probability density of the observed tree topology, state vectors, and branch lengths.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe CassiopeiaTree object containing the tree topology and character states.
mutation_ratefloatThe mutation rate of the model.
birth_ratefloatThe birth rate of the model.
sampling_probabilityfloatThe sampling probability of the model.

Outputs

NameTypeDescription
llfloatThe log full joint probability density.

Internal Logic

The function iterates through the edges of the tree and calculates the likelihood contribution from both the birth process and the mutation process. It considers the sampling probability and uses binomial coefficients to account for the number of mutations along each edge.


calc_numerical_log_likelihood

Description

Calculates the numerical log likelihood of the observed data (character states and tree topology) using numerical integration.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe CassiopeiaTree object containing the tree topology and character states.
mutation_ratefloatThe mutation rate of the model.
birth_ratefloatThe birth rate of the model.
sampling_probabilityfloatThe sampling probability of the model.
epsrelfloatThe relative error tolerance for numerical integration.

Outputs

NameTypeDescription
resfloatThe numerical log likelihood.

Internal Logic

The function defines an inner function f that calculates the full joint probability for a given set of internal node times. It then uses scipy.integrate.nquad to numerically integrate f over all possible internal node times, effectively marginalizing out the times and obtaining the likelihood of the observed data.


calc_numerical_log_joints

Description

Calculates the numerical log joint probability density of the observed data and all possible times for a specific node in the tree.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe CassiopeiaTree object containing the tree topology and character states.
nodestrThe name of the internal node for which to calculate the log joints.
mutation_ratefloatThe mutation rate of the model.
birth_ratefloatThe birth rate of the model.
sampling_probabilityfloatThe sampling probability of the model.
discretization_levelintThe number of timesteps used to discretize time.
epsrelfloatThe relative error tolerance for numerical integration.

Outputs

NameTypeDescription
resnp.arrayAn array of log joint probabilities for each possible time of the specified node.

Internal Logic

The function iterates through all possible discretized times for the specified node. For each time, it defines an inner function f that calculates the full joint probability for a given set of times for the remaining internal nodes. It then uses scipy.integrate.nquad to numerically integrate f over all possible times for the other nodes, effectively marginalizing out their times and obtaining the joint probability of the observed data and the specified node’s time.


numerical_posterior_time

Description

Calculates the numerical posterior time distribution for a specific node in the tree.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe CassiopeiaTree object containing the tree topology and character states.
nodestrThe name of the internal node for which to calculate the posterior time distribution.
mutation_ratefloatThe mutation rate of the model.
birth_ratefloatThe birth rate of the model.
sampling_probabilityfloatThe sampling probability of the model.
discretization_levelintThe number of timesteps used to discretize time.
epsrelfloatThe relative error tolerance for numerical integration.

Outputs

NameTypeDescription
numerical_posteriornp.arrayAn array representing the posterior time distribution of the specified node.

Internal Logic

The function first calculates the numerical log joints using calc_numerical_log_joints. It then normalizes the log joints to obtain the posterior time distribution.


relative_error

Description

Calculates the relative error between two positive floating-point numbers.

Inputs

NameTypeDescription
xfloatThe first number.
yfloatThe second number.

Outputs

NameTypeDescription
floatThe relative error between x and y.

Internal Logic

The function calculates the maximum of the absolute differences between the ratios of the two numbers and their inverses.


_non_root_internal_nodes

Description

Returns a list of internal nodes in the tree, excluding the root node.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe CassiopeiaTree object.

Outputs

NameTypeDescription
List[str]A list of internal node names, excluding the root.

Internal Logic

The function uses set operations to find the difference between the set of internal nodes and the set containing only the root node.


_get_number_of_mutated_characters_in_node

Description

Counts the number of mutated characters (excluding missing characters) in a given node.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe CassiopeiaTree object.
vstrThe name of the node.

Outputs

NameTypeDescription
intThe number of mutated characters in the node.

Internal Logic

The function retrieves the character states of the node and counts the number of states that are neither 0 nor the missing state indicator.