utilities.py
Here’s a high-level description of the utilities.py
file in the cassiopeia/data
directory:
This file contains utility functions for working with Cassiopeia datasets, particularly for manipulating and analyzing character matrices, allele tables, and phylogenetic trees. It includes functions for bootstrapping, converting between different data formats, computing dissimilarity maps, and various other operations on tree structures and genetic data.
Code Structure
The file consists of standalone functions that can be grouped into several categories:
- Bootstrapping functions
- Data conversion functions
- Tree manipulation functions
- Dissimilarity and distance computation functions
- Utility functions for working with character states and indels
These functions are used throughout the Cassiopeia package to support various data processing and analysis tasks.
Symbols
Here are some of the key functions in this file:
sample_bootstrap_character_matrices
Description
Creates bootstrap samples of character matrices by sampling characters with replacement.
Inputs
character_matrix
: The original character matrixprior_probabilities
: Optional prior probabilities for charactersnum_bootstraps
: Number of bootstrap samples to createrandom_state
: Optional random state for reproducibility
Outputs
A list of tuples, each containing a bootstrapped character matrix and corresponding priors.
convert_alleletable_to_character_matrix
Description
Converts an allele table to a character matrix format.
Inputs
alleletable
: The input allele table- Various optional parameters for filtering and processing
Outputs
A tuple containing the character matrix, prior probabilities, and a mapping of states to indels.
compute_phylogenetic_weight_matrix
Description
Computes a phylogenetic weight matrix based on the distances between leaves in a tree.
Inputs
tree
: A CassiopeiaTree objectinverse
: Whether to compute inverse weightsinverse_fn
: Function to use for inverse computation
Outputs
A pandas DataFrame representing the phylogenetic weight matrix.
net_relatedness_index
Description
Computes the net relatedness index between two groups of indices in a dissimilarity map.
Inputs
dissimilarity_map
: A numpy array of dissimilaritiesindices_1
: First group of indicesindices_2
: Second group of indices
Outputs
The computed net relatedness index as a float.
Dependencies
The file relies on several external libraries, including:
- numpy
- pandas
- networkx
- scipy
- ete3
It also imports from other parts of the Cassiopeia package, particularly from the mixins
and preprocess
modules.
This file is central to many data processing tasks in Cassiopeia, providing essential utilities for working with genetic data and phylogenetic trees.