High-level description

This file provides functions for estimating parameters related to lineage tracing, specifically the mutation rate and missing data rates. These functions are used to infer the underlying biological processes from observed data in a CassiopeiaTree.

References

This code references the CassiopeiaTree class from cassiopeia.data.CassiopeiaTree and the ParameterEstimateError and ParameterEstimateWarning exceptions from cassiopeia.mixins.

Symbols

get_proportion_of_missing_data

Description

Calculates the proportion of missing entries in the character matrix of a CassiopeiaTree. The missing state is indicated by the tree.missing_state_indicator attribute.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe CassiopeiaTree object containing the character matrix.
layerstr, optionalThe name of the layer in the tree.layers dictionary to use for the character matrix. If None, the default character_matrix is used.

Outputs

NameTypeDescription
missing_proportionfloatThe proportion of missing cell/character entries in the character matrix.

Internal Logic

The function counts the number of entries in the character matrix that are equal to the missing_state_indicator. This count is then divided by the total number of entries in the matrix to calculate the proportion of missing data.

Error Handling

Raises a ParameterEstimateError if no character matrix is found in the tree.

get_proportion_of_mutation

Description

Calculates the proportion of mutated entries in the character matrix of a CassiopeiaTree, excluding missing entries.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe CassiopeiaTree object containing the character matrix.
layerstr, optionalThe name of the layer in the tree.layers dictionary to use for the character matrix. If None, the default character_matrix is used.

Outputs

NameTypeDescription
mutation_proportionfloatThe proportion of non-missing cell/character entries that are mutated.

Internal Logic

The function first calculates the number of missing entries. Then, it counts the number of non-missing entries that are not equal to 0 (uncut state). This count is then divided by the total number of non-missing entries to calculate the proportion of mutated entries.

Error Handling

Raises a ParameterEstimateError if no character matrix is found in the tree.

estimate_mutation_rate

Description

Estimates the mutation rate from a CassiopeiaTree, considering either a continuous or per-generation model.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe CassiopeiaTree object containing the character matrix and tree topology.
continuousbool, optionalWhether to calculate a continuous mutation rate (accounting for branch lengths) or a discrete per-generation rate. Defaults to True.
assume_root_implicit_branchbool, optionalWhether to assume an implicit branch leading from the root if it doesn’t exist. Defaults to True.
layerstr, optionalThe name of the layer in the tree.layers dictionary to use for the character matrix. If None, the default character_matrix is used.

Outputs

NameTypeDescription
mutation_ratefloatThe estimated mutation rate.

Internal Logic

The function first retrieves the proportion of mutated entries from the tree’s parameters or calculates it using get_proportion_of_mutation. Then, it estimates the mutation rate based on the chosen model:

  • Discrete (per-generation): The rate is estimated using the formula: mutation_rate = 1 - (1 - mutation_proportion) ^ (1 / mean_depth), where mean_depth is the average depth of the tree.
  • Continuous: The rate is estimated using the formula: mutation_rate = -np.log(1 - mutation_proportion) / mean_time, where mean_time is the average time of the tree.

If assume_root_implicit_branch is True and the tree doesn’t have a single leading edge from the root, an implicit branch is added to the calculation of mean_depth or mean_time.

Error Handling

Raises a ParameterEstimateError if the mutation_proportion parameter is not between 0 and 1.

estimate_missing_data_rates

Description

Estimates both the stochastic missing probability and the heritable missing rate from a CassiopeiaTree, given one of the two parameters.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe CassiopeiaTree object containing the character matrix and tree topology.
continuousbool, optionalWhether to calculate a continuous missing rate (accounting for branch lengths) or a discrete per-generation rate. Defaults to True.
assume_root_implicit_branchbool, optionalWhether to assume an implicit branch leading from the root if it doesn’t exist. Defaults to True.
stochastic_missing_probabilityfloat, optionalThe stochastic missing probability. If provided, overrides the value on the tree.
heritable_missing_ratefloat, optionalThe heritable missing rate. If provided, overrides the value on the tree.
layerstr, optionalThe name of the layer in the tree.layers dictionary to use for the character matrix. If None, the default character_matrix is used.

Outputs

NameTypeDescription
stochastic_missing_probabilityfloatThe stochastic missing probability.
heritable_missing_ratefloatThe heritable missing rate.

Internal Logic

The function first retrieves the total missing proportion from the tree’s parameters or calculates it using get_proportion_of_missing_data. It then checks if either the stochastic missing probability or the heritable missing rate is provided as an argument or stored in the tree’s parameters. If both or neither are provided, it raises a ParameterEstimateError.

If only one of the parameters is provided, the function estimates the other parameter based on the chosen model (continuous or discrete) and the formula: total_missing_proportion = heritable_proportion + stochastic_proportion - heritable_proportion * stochastic_proportion.

The heritable proportion is calculated using either the average depth or average time of the tree, depending on the chosen model and whether to assume an implicit root branch.

Error Handling

  • Raises a ParameterEstimateError if:
    • The total_missing_proportion, stochastic_missing_probability, or heritable_missing_rate have invalid values (e.g., outside the range of 0 to 1).
    • Both or neither of stochastic_missing_probability and heritable_missing_rate are provided.
  • Raises a ParameterEstimateWarning if the estimated parameter is negative, indicating a potential issue with the provided parameter.