coupling.py
High-level description
The compute_evolutionary_coupling
function calculates the evolutionary coupling between different categories of a given meta variable in a CassiopeiaTree. This coupling statistic, a Z-normalized mean distance between categories, reflects the phylogenetic relatedness of these categories within the tree.
References
This function references the compute_phylogenetic_weight_matrix
, compute_inter_cluster_distances
, and net_relatedness_index
functions from the cassiopeia.data.utilities
module.
Symbols
compute_evolutionary_coupling
Description
This function calculates the evolutionary coupling between categories of a specified meta variable in a CassiopeiaTree. It first computes the phylogenetic weight matrix or uses a precomputed one. Then, it filters categories based on a minimum proportion threshold. It calculates inter-cluster distances between categories using a specified distance function (defaulting to Net Relatedness Index). To generate a null distribution, it shuffles the meta variable assignments and recomputes the inter-cluster distances multiple times. Finally, it calculates Z-scores for the observed inter-cluster distances based on the null distribution, representing the evolutionary coupling between categories.
Inputs
Name | Type | Description |
---|---|---|
tree | CassiopeiaTree | The CassiopeiaTree object containing the tree and meta data. |
meta_variable | str | The name of the column in tree.cell_meta representing the categorical variable. |
minimum_proportion | float | Minimum proportion of cells a category must appear in to be considered (default 0.05). |
number_of_shuffles | int | Number of shuffles for generating the null distribution (default 500). |
random_state | Optional[np.random.RandomState] | Numpy random state for shuffling (default None). |
dissimilarity_map | Optional[pd.DataFrame] | Precomputed dissimilarity map between leaves (default None). |
cluster_comparison_function | Callable | Function to compare mean distance between groups (default net_relatedness_index ). |
**comparison_kwargs | Additional arguments for the cluster comparison function. |
Outputs
Name | Type | Description |
---|---|---|
Z_scores | pd.DataFrame | A K x K DataFrame containing the evolutionary coupling scores between K categories. |
Internal Logic
- Compute/Retrieve Dissimilarity: Calculate the phylogenetic weight matrix using
compute_phylogenetic_weight_matrix
if nodissimilarity_map
is provided, otherwise use the provided map. - Filter Categories: If
minimum_proportion
is greater than 0, filter out categories with frequencies below the specified proportion. - Calculate Inter-cluster Distances: Compute the distances between categories using the
compute_inter_cluster_distances
function with the specifiedcluster_comparison_function
and additional arguments. - Generate Null Distribution: Shuffle the meta variable assignments
number_of_shuffles
times and recompute inter-cluster distances for each shuffle, storing the results. - Calculate Z-scores: For each pair of categories, calculate the Z-score of the observed inter-cluster distance based on the mean and standard deviation of the corresponding distances in the null distribution.
Performance Considerations
The computational complexity is O(n^2 log n + (B+1)(K^2 * O(distance_function))), where n is the number of leaves, K is the number of categories, and B is the number of shuffles. Precomputing the dissimilarity map can significantly reduce runtime.