parameter_estimators.py
High-level description
This file provides functions for estimating parameters related to lineage tracing, specifically the mutation rate and missing data rates. These functions are used to infer the underlying biological processes from observed data in a CassiopeiaTree.
References
This code references the CassiopeiaTree
class from cassiopeia.data.CassiopeiaTree
and the ParameterEstimateError
and ParameterEstimateWarning
exceptions from cassiopeia.mixins
.
Symbols
get_proportion_of_missing_data
Description
Calculates the proportion of missing entries in the character matrix of a CassiopeiaTree. The missing state is indicated by the tree.missing_state_indicator
attribute.
Inputs
Name | Type | Description |
---|---|---|
tree | CassiopeiaTree | The CassiopeiaTree object containing the character matrix. |
layer | str, optional | The name of the layer in the tree.layers dictionary to use for the character matrix. If None, the default character_matrix is used. |
Outputs
Name | Type | Description |
---|---|---|
missing_proportion | float | The proportion of missing cell/character entries in the character matrix. |
Internal Logic
The function counts the number of entries in the character matrix that are equal to the missing_state_indicator
. This count is then divided by the total number of entries in the matrix to calculate the proportion of missing data.
Error Handling
Raises a ParameterEstimateError
if no character matrix is found in the tree.
get_proportion_of_mutation
Description
Calculates the proportion of mutated entries in the character matrix of a CassiopeiaTree, excluding missing entries.
Inputs
Name | Type | Description |
---|---|---|
tree | CassiopeiaTree | The CassiopeiaTree object containing the character matrix. |
layer | str, optional | The name of the layer in the tree.layers dictionary to use for the character matrix. If None, the default character_matrix is used. |
Outputs
Name | Type | Description |
---|---|---|
mutation_proportion | float | The proportion of non-missing cell/character entries that are mutated. |
Internal Logic
The function first calculates the number of missing entries. Then, it counts the number of non-missing entries that are not equal to 0 (uncut state). This count is then divided by the total number of non-missing entries to calculate the proportion of mutated entries.
Error Handling
Raises a ParameterEstimateError
if no character matrix is found in the tree.
estimate_mutation_rate
Description
Estimates the mutation rate from a CassiopeiaTree, considering either a continuous or per-generation model.
Inputs
Name | Type | Description |
---|---|---|
tree | CassiopeiaTree | The CassiopeiaTree object containing the character matrix and tree topology. |
continuous | bool, optional | Whether to calculate a continuous mutation rate (accounting for branch lengths) or a discrete per-generation rate. Defaults to True. |
assume_root_implicit_branch | bool, optional | Whether to assume an implicit branch leading from the root if it doesn’t exist. Defaults to True. |
layer | str, optional | The name of the layer in the tree.layers dictionary to use for the character matrix. If None, the default character_matrix is used. |
Outputs
Name | Type | Description |
---|---|---|
mutation_rate | float | The estimated mutation rate. |
Internal Logic
The function first retrieves the proportion of mutated entries from the tree’s parameters or calculates it using get_proportion_of_mutation
. Then, it estimates the mutation rate based on the chosen model:
- Discrete (per-generation): The rate is estimated using the formula:
mutation_rate = 1 - (1 - mutation_proportion) ^ (1 / mean_depth)
, wheremean_depth
is the average depth of the tree. - Continuous: The rate is estimated using the formula:
mutation_rate = -np.log(1 - mutation_proportion) / mean_time
, wheremean_time
is the average time of the tree.
If assume_root_implicit_branch
is True and the tree doesn’t have a single leading edge from the root, an implicit branch is added to the calculation of mean_depth
or mean_time
.
Error Handling
Raises a ParameterEstimateError
if the mutation_proportion
parameter is not between 0 and 1.
estimate_missing_data_rates
Description
Estimates both the stochastic missing probability and the heritable missing rate from a CassiopeiaTree, given one of the two parameters.
Inputs
Name | Type | Description |
---|---|---|
tree | CassiopeiaTree | The CassiopeiaTree object containing the character matrix and tree topology. |
continuous | bool, optional | Whether to calculate a continuous missing rate (accounting for branch lengths) or a discrete per-generation rate. Defaults to True. |
assume_root_implicit_branch | bool, optional | Whether to assume an implicit branch leading from the root if it doesn’t exist. Defaults to True. |
stochastic_missing_probability | float, optional | The stochastic missing probability. If provided, overrides the value on the tree. |
heritable_missing_rate | float, optional | The heritable missing rate. If provided, overrides the value on the tree. |
layer | str, optional | The name of the layer in the tree.layers dictionary to use for the character matrix. If None, the default character_matrix is used. |
Outputs
Name | Type | Description |
---|---|---|
stochastic_missing_probability | float | The stochastic missing probability. |
heritable_missing_rate | float | The heritable missing rate. |
Internal Logic
The function first retrieves the total missing proportion from the tree’s parameters or calculates it using get_proportion_of_missing_data
. It then checks if either the stochastic missing probability or the heritable missing rate is provided as an argument or stored in the tree’s parameters. If both or neither are provided, it raises a ParameterEstimateError
.
If only one of the parameters is provided, the function estimates the other parameter based on the chosen model (continuous or discrete) and the formula: total_missing_proportion = heritable_proportion + stochastic_proportion - heritable_proportion * stochastic_proportion
.
The heritable proportion is calculated using either the average depth or average time of the tree, depending on the chosen model and whether to assume an implicit root branch.
Error Handling
- Raises a
ParameterEstimateError
if:- The
total_missing_proportion
,stochastic_missing_probability
, orheritable_missing_rate
have invalid values (e.g., outside the range of 0 to 1). - Both or neither of
stochastic_missing_probability
andheritable_missing_rate
are provided.
- The
- Raises a
ParameterEstimateWarning
if the estimated parameter is negative, indicating a potential issue with the provided parameter.