High-level description

This file provides functions for calculating tree metrics, specifically parsimony and likelihood, for a given CassiopeiaTree. It includes functions for calculating the number of mutations on a tree, the log likelihood of a character on a tree, and wrapper functions for calculating the overall likelihood of a tree under both discrete and continuous models of lineage tracing.

References

This code references the CassiopeiaTree class from cassiopeia.data and the parameter_estimators module from cassiopeia.tools.

Symbols

calculate_parsimony

Description

Calculates the parsimony score of a tree, defined as the number of character/state mutations that occur on the edges of the tree.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe tree to calculate parsimony over
infer_ancestral_charactersboolWhether to infer the ancestral character states of the tree using Camin-Sokal Parsimony. Defaults to False.
treat_missing_as_mutationboolWhether to treat transitions from a non-missing state to a missing state as mutations. Defaults to False.

Outputs

NameTypeDescription
parsimonyintThe number of mutations that have occurred on the tree

Internal Logic

  1. If infer_ancestral_characters is True, the ancestral character states of the tree are inferred using Camin-Sokal Parsimony.
  2. The function iterates over all edges in the tree.
  3. For each edge, it retrieves the character states of the parent and child nodes.
  4. It then calculates the number of mutations along the edge using the get_mutations_along_edge method of the CassiopeiaTree object.
  5. The total number of mutations is summed over all edges and returned.

log_transition_probability

Description

Calculates the log transition probability between two given states for a specific character.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe tree on which the priors are stored
characterintThe index of the character
sUnion[int, str]The original state
s_Union[int, str]The state being transitioned to
tfloatThe length of time that the transition can occur along
mutation_probability_function_of_timeCallable[[float], float]The function defining the probability of a lineage acquiring a mutation within a given time
missing_probability_function_of_timeCallable[[float], float]The function defining the probability of a lineage acquiring heritable missing data within a given time

Outputs

NameTypeDescription
log_probabilityfloatThe log transition probability between the states

Internal Logic

The function calculates the log transition probability based on the following assumptions:

  1. State 0 represents the uncut state.
  2. Only state 0 can transition to non-zero states.
  3. Any non-missing state can mutate to the missing state, specified by tree.missing_state_indicator.
  4. The probability of acquiring a mutation is given by the time t and the mutation_probability_function_of_time.
  5. The probability of acquiring heritable missing data is given by the time t and the missing_probability_function_of_time.
  6. The priors on the tree (tree.priors) are used to determine the probability of acquiring a non-missing, non-zero state.

The function uses a series of conditional statements to handle different combinations of s and s_ and returns the corresponding log transition probability.

log_likelihood_of_character

Description

Calculates the log likelihood of a given character on the tree using Felsenstein’s Pruning Algorithm.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe tree on which to calculate the likelihood
characterintThe index of the character to calculate the likelihood of
use_internal_character_statesboolIndicates if internal node character states should be assumed to be specified exactly. Defaults to False.
mutation_probability_function_of_timeCallable[[float], float]The function defining the probability of a lineage acquiring a mutation within a given time
missing_probability_function_of_timeCallable[[float], float]The function defining the probability of a lineage acquiring heritable missing data within a given time
stochastic_missing_probabilityfloatThe probability that a cell/character pair acquires stochastic missing data at the end of the lineage
implicit_root_branch_lengthfloatThe length of the implicit root branch. Used if the implicit root needs to be added. Defaults to 1.

Outputs

NameTypeDescription
log_likelihoodfloatThe log likelihood of the tree on one character

Internal Logic

  1. The function performs a depth-first traversal of the tree in postorder.
  2. For each node, it calculates the likelihood of each possible state at that node based on the likelihoods of its children and the transition probabilities between states.
  3. If use_internal_character_states is False, the function assumes an implicit root with the uncut state (0) at each character and marginalizes over the transition from the implicit root to its child.
  4. If use_internal_character_states is True, the function returns the likelihood of the state annotated at the root for the given character.

get_lineage_tracing_parameters

Description

Retrieves or estimates the lineage tracing parameters (mutation rate, heritable missing rate, and stochastic missing probability) from a CassiopeiaTree.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe tree from which to retrieve or estimate the parameters
continuousboolIf the parameters are to be estimated, whether to estimate them as continuous or discrete parameters
assume_root_implicit_branchboolIn the tree depth/time in estimating the rate parameters, whether or not to include an implicit root
layerOptional[str]Layer to use for the character matrix in estimating parameters. If None, the current character_matrix variable will be used. Defaults to None.

Outputs

NameTypeDescription
mutation_ratefloatThe mutation rate
heritable_missing_ratefloatThe heritable missing rate
stochastic_missing_probabilityfloatThe stochastic missing probability

Internal Logic

  1. The function attempts to retrieve the parameters from tree.parameters.
  2. If any of the parameters are not found, they are estimated using the estimate_mutation_rate and estimate_missing_data_rates functions.
  3. The function checks if the estimated parameters have valid values and raises a TreeMetricError if they are invalid.

calculate_likelihood_discrete

Description

Calculates the log likelihood of a tree under a discrete model of lineage tracing.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe tree on which to calculate likelihood over
use_internal_character_statesboolIndicates if internal node character states should be assumed to be specified exactly. Defaults to False.
layerOptional[str]Layer to use for the character matrix in estimating parameters. If None, the current character_matrix variable will be used. Defaults to None.

Outputs

NameTypeDescription
log_likelihoodfloatThe log likelihood of the tree given the observed character states

Internal Logic

  1. The function retrieves the lineage tracing parameters using get_lineage_tracing_parameters.
  2. It calculates the log likelihood for each character using log_likelihood_of_character with discrete rate functions.
  3. The log likelihoods of all characters are summed to obtain the overall log likelihood of the tree.

calculate_likelihood_continuous

Description

Calculates the log likelihood of a tree under a continuous model of lineage tracing.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe tree on which to calculate likelihood over
use_internal_character_statesboolIndicates if internal node character states should be assumed to be specified exactly. Defaults to False.
layerOptional[str]Layer to use for the character matrix in estimating parameters. If None, the current character_matrix variable will be used. Defaults to None.

Outputs

NameTypeDescription
log_likelihoodfloatThe log likelihood of the tree given the observed character states

Internal Logic

  1. The function retrieves the lineage tracing parameters using get_lineage_tracing_parameters.
  2. It calculates the log likelihood for each character using log_likelihood_of_character with continuous rate functions.
  3. The log likelihoods of all characters are summed to obtain the overall log likelihood of the tree.

Error Handling

The functions in this file raise TreeMetricError exceptions if the input tree is not initialized or if the character states are not properly initialized. They also raise TreeMetricError exceptions if the lineage tracing parameters have invalid values.