High-level description
Thecompute_evolutionary_coupling function calculates the evolutionary coupling between different categories of a given meta variable in a CassiopeiaTree. This coupling statistic, a Z-normalized mean distance between categories, reflects the phylogenetic relatedness of these categories within the tree.
References
This function references thecompute_phylogenetic_weight_matrix, compute_inter_cluster_distances, and net_relatedness_index functions from the cassiopeia.data.utilities module.
Symbols
compute_evolutionary_coupling
Description
This function calculates the evolutionary coupling between categories of a specified meta variable in a CassiopeiaTree. It first computes the phylogenetic weight matrix or uses a precomputed one. Then, it filters categories based on a minimum proportion threshold. It calculates inter-cluster distances between categories using a specified distance function (defaulting to Net Relatedness Index). To generate a null distribution, it shuffles the meta variable assignments and recomputes the inter-cluster distances multiple times. Finally, it calculates Z-scores for the observed inter-cluster distances based on the null distribution, representing the evolutionary coupling between categories.Inputs
| Name | Type | Description | 
|---|---|---|
| tree | CassiopeiaTree | The CassiopeiaTree object containing the tree and meta data. | 
| meta_variable | str | The name of the column in tree.cell_meta representing the categorical variable. | 
| minimum_proportion | float | Minimum proportion of cells a category must appear in to be considered (default 0.05). | 
| number_of_shuffles | int | Number of shuffles for generating the null distribution (default 500). | 
| random_state | Optional[np.random.RandomState] | Numpy random state for shuffling (default None). | 
| dissimilarity_map | Optional[pd.DataFrame] | Precomputed dissimilarity map between leaves (default None). | 
| cluster_comparison_function | Callable | Function to compare mean distance between groups (default net_relatedness_index). | 
| **comparison_kwargs | Additional arguments for the cluster comparison function. | 
Outputs
| Name | Type | Description | 
|---|---|---|
| Z_scores | pd.DataFrame | A K x K DataFrame containing the evolutionary coupling scores between K categories. | 
Internal Logic
- Compute/Retrieve Dissimilarity: Calculate the phylogenetic weight matrix using 
compute_phylogenetic_weight_matrixif nodissimilarity_mapis provided, otherwise use the provided map. - Filter Categories: If 
minimum_proportionis greater than 0, filter out categories with frequencies below the specified proportion. - Calculate Inter-cluster Distances: Compute the distances between categories using the 
compute_inter_cluster_distancesfunction with the specifiedcluster_comparison_functionand additional arguments. - Generate Null Distribution: Shuffle the meta variable assignments 
number_of_shufflestimes and recompute inter-cluster distances for each shuffle, storing the results. - Calculate Z-scores: For each pair of categories, calculate the Z-score of the observed inter-cluster distance based on the mean and standard deviation of the corresponding distances in the null distribution.
 
