High-level description
This file provides functions for calculating tree metrics, specifically parsimony and likelihood, for a given CassiopeiaTree
. It includes functions for calculating the number of mutations on a tree, the log likelihood of a character on a tree, and wrapper functions for calculating the overall likelihood of a tree under both discrete and continuous models of lineage tracing.
References
This code references the CassiopeiaTree
class from cassiopeia.data
and the parameter_estimators
module from cassiopeia.tools
.
Symbols
calculate_parsimony
Description
Calculates the parsimony score of a tree, defined as the number of character/state mutations that occur on the edges of the tree.
Name | Type | Description |
---|
tree | CassiopeiaTree | The tree to calculate parsimony over |
infer_ancestral_characters | bool | Whether to infer the ancestral character states of the tree using Camin-Sokal Parsimony. Defaults to False. |
treat_missing_as_mutation | bool | Whether to treat transitions from a non-missing state to a missing state as mutations. Defaults to False. |
Outputs
Name | Type | Description |
---|
parsimony | int | The number of mutations that have occurred on the tree |
Internal Logic
- If
infer_ancestral_characters
is True, the ancestral character states of the tree are inferred using Camin-Sokal Parsimony.
- The function iterates over all edges in the tree.
- For each edge, it retrieves the character states of the parent and child nodes.
- It then calculates the number of mutations along the edge using the
get_mutations_along_edge
method of the CassiopeiaTree
object.
- The total number of mutations is summed over all edges and returned.
log_transition_probability
Description
Calculates the log transition probability between two given states for a specific character.
Name | Type | Description |
---|
tree | CassiopeiaTree | The tree on which the priors are stored |
character | int | The index of the character |
s | Union[int, str] | The original state |
s_ | Union[int, str] | The state being transitioned to |
t | float | The length of time that the transition can occur along |
mutation_probability_function_of_time | Callable[[float], float] | The function defining the probability of a lineage acquiring a mutation within a given time |
missing_probability_function_of_time | Callable[[float], float] | The function defining the probability of a lineage acquiring heritable missing data within a given time |
Outputs
Name | Type | Description |
---|
log_probability | float | The log transition probability between the states |
Internal Logic
The function calculates the log transition probability based on the following assumptions:
- State
0
represents the uncut state.
- Only state
0
can transition to non-zero states.
- Any non-missing state can mutate to the missing state, specified by
tree.missing_state_indicator
.
- The probability of acquiring a mutation is given by the time
t
and the mutation_probability_function_of_time
.
- The probability of acquiring heritable missing data is given by the time
t
and the missing_probability_function_of_time
.
- The priors on the tree (
tree.priors
) are used to determine the probability of acquiring a non-missing, non-zero state.
The function uses a series of conditional statements to handle different combinations of s
and s_
and returns the corresponding log transition probability.
log_likelihood_of_character
Description
Calculates the log likelihood of a given character on the tree using Felsenstein’s Pruning Algorithm.
Name | Type | Description |
---|
tree | CassiopeiaTree | The tree on which to calculate the likelihood |
character | int | The index of the character to calculate the likelihood of |
use_internal_character_states | bool | Indicates if internal node character states should be assumed to be specified exactly. Defaults to False. |
mutation_probability_function_of_time | Callable[[float], float] | The function defining the probability of a lineage acquiring a mutation within a given time |
missing_probability_function_of_time | Callable[[float], float] | The function defining the probability of a lineage acquiring heritable missing data within a given time |
stochastic_missing_probability | float | The probability that a cell/character pair acquires stochastic missing data at the end of the lineage |
implicit_root_branch_length | float | The length of the implicit root branch. Used if the implicit root needs to be added. Defaults to 1. |
Outputs
Name | Type | Description |
---|
log_likelihood | float | The log likelihood of the tree on one character |
Internal Logic
- The function performs a depth-first traversal of the tree in postorder.
- For each node, it calculates the likelihood of each possible state at that node based on the likelihoods of its children and the transition probabilities between states.
- If
use_internal_character_states
is False, the function assumes an implicit root with the uncut state (0) at each character and marginalizes over the transition from the implicit root to its child.
- If
use_internal_character_states
is True, the function returns the likelihood of the state annotated at the root for the given character.
get_lineage_tracing_parameters
Description
Retrieves or estimates the lineage tracing parameters (mutation rate, heritable missing rate, and stochastic missing probability) from a CassiopeiaTree
.
Name | Type | Description |
---|
tree | CassiopeiaTree | The tree from which to retrieve or estimate the parameters |
continuous | bool | If the parameters are to be estimated, whether to estimate them as continuous or discrete parameters |
assume_root_implicit_branch | bool | In the tree depth/time in estimating the rate parameters, whether or not to include an implicit root |
layer | Optional[str] | Layer to use for the character matrix in estimating parameters. If None, the current character_matrix variable will be used. Defaults to None. |
Outputs
Name | Type | Description |
---|
mutation_rate | float | The mutation rate |
heritable_missing_rate | float | The heritable missing rate |
stochastic_missing_probability | float | The stochastic missing probability |
Internal Logic
- The function attempts to retrieve the parameters from
tree.parameters
.
- If any of the parameters are not found, they are estimated using the
estimate_mutation_rate
and estimate_missing_data_rates
functions.
- The function checks if the estimated parameters have valid values and raises a
TreeMetricError
if they are invalid.
calculate_likelihood_discrete
Description
Calculates the log likelihood of a tree under a discrete model of lineage tracing.
Name | Type | Description |
---|
tree | CassiopeiaTree | The tree on which to calculate likelihood over |
use_internal_character_states | bool | Indicates if internal node character states should be assumed to be specified exactly. Defaults to False. |
layer | Optional[str] | Layer to use for the character matrix in estimating parameters. If None, the current character_matrix variable will be used. Defaults to None. |
Outputs
Name | Type | Description |
---|
log_likelihood | float | The log likelihood of the tree given the observed character states |
Internal Logic
- The function retrieves the lineage tracing parameters using
get_lineage_tracing_parameters
.
- It calculates the log likelihood for each character using
log_likelihood_of_character
with discrete rate functions.
- The log likelihoods of all characters are summed to obtain the overall log likelihood of the tree.
calculate_likelihood_continuous
Description
Calculates the log likelihood of a tree under a continuous model of lineage tracing.
Name | Type | Description |
---|
tree | CassiopeiaTree | The tree on which to calculate likelihood over |
use_internal_character_states | bool | Indicates if internal node character states should be assumed to be specified exactly. Defaults to False. |
layer | Optional[str] | Layer to use for the character matrix in estimating parameters. If None, the current character_matrix variable will be used. Defaults to None. |
Outputs
Name | Type | Description |
---|
log_likelihood | float | The log likelihood of the tree given the observed character states |
Internal Logic
- The function retrieves the lineage tracing parameters using
get_lineage_tracing_parameters
.
- It calculates the log likelihood for each character using
log_likelihood_of_character
with continuous rate functions.
- The log likelihoods of all characters are summed to obtain the overall log likelihood of the tree.
Error Handling
The functions in this file raise TreeMetricError
exceptions if the input tree is not initialized or if the character states are not properly initialized. They also raise TreeMetricError
exceptions if the lineage tracing parameters have invalid values.