coupling.py - Stanza Demo

High-level description

The compute_evolutionary_coupling function calculates the evolutionary coupling between different categories of a given meta variable in a CassiopeiaTree. This coupling statistic, a Z-normalized mean distance between categories, reflects the phylogenetic relatedness of these categories within the tree.

References

This function references the compute_phylogenetic_weight_matrix, compute_inter_cluster_distances, and net_relatedness_index functions from the cassiopeia.data.utilities module.

Symbols

`compute_evolutionary_coupling`

Description

This function calculates the evolutionary coupling between categories of a specified meta variable in a CassiopeiaTree. It first computes the phylogenetic weight matrix or uses a precomputed one. Then, it filters categories based on a minimum proportion threshold. It calculates inter-cluster distances between categories using a specified distance function (defaulting to Net Relatedness Index). To generate a null distribution, it shuffles the meta variable assignments and recomputes the inter-cluster distances multiple times. Finally, it calculates Z-scores for the observed inter-cluster distances based on the null distribution, representing the evolutionary coupling between categories.

Inputs

Name	Type	Description
tree	CassiopeiaTree	The CassiopeiaTree object containing the tree and meta data.
meta_variable	str	The name of the column in `tree.cell_meta` representing the categorical variable.
minimum_proportion	float	Minimum proportion of cells a category must appear in to be considered (default 0.05).
number_of_shuffles	int	Number of shuffles for generating the null distribution (default 500).
random_state	Optional[np.random.RandomState]	Numpy random state for shuffling (default None).
dissimilarity_map	Optional[pd.DataFrame]	Precomputed dissimilarity map between leaves (default None).
cluster_comparison_function	Callable	Function to compare mean distance between groups (default `net_relatedness_index`).
**comparison_kwargs		Additional arguments for the cluster comparison function.

Outputs

Name	Type	Description
Z_scores	pd.DataFrame	A K x K DataFrame containing the evolutionary coupling scores between K categories.

Internal Logic

Compute/Retrieve Dissimilarity: Calculate the phylogenetic weight matrix using compute_phylogenetic_weight_matrix if no dissimilarity_map is provided, otherwise use the provided map.
Filter Categories: If minimum_proportion is greater than 0, filter out categories with frequencies below the specified proportion.
Calculate Inter-cluster Distances: Compute the distances between categories using the compute_inter_cluster_distances function with the specified cluster_comparison_function and additional arguments.
Generate Null Distribution: Shuffle the meta variable assignments number_of_shuffles times and recompute inter-cluster distances for each shuffle, storing the results.
Calculate Z-scores: For each pair of categories, calculate the Z-score of the observed inter-cluster distance based on the mean and standard deviation of the corresponding distances in the null distribution.

Performance Considerations

The computational complexity is O(n^2 log n + (B+1)(K^2 * O(distance_function))), where n is the number of leaves, K is the number of categories, and B is the number of shuffles. Precomputing the dissimilarity map can significantly reduce runtime.

Project Root

​High-level description

​References

​Symbols

​compute_evolutionary_coupling

​Description

​Inputs

​Outputs

​Internal Logic

​Performance Considerations