CassiopeiaTree.py
High-level description
The CassiopeiaTree
class is the fundamental data structure in Cassiopeia, representing a phylogenetic tree of a clonal population. It stores a tree topology, a character matrix with mutation states for each cell, and metadata associated with cells and characters. The class provides methods for manipulating the tree, reconstructing ancestral characters, computing dissimilarities, and accessing various tree properties.
Code Structure
The CassiopeiaTree
class is the main symbol in the code. It holds references to other data structures like character_matrix
, cell_meta
, character_meta
, priors
, and dissimilarity_map
. The class methods operate on these data structures and the underlying tree topology represented as a Networkx DiGraph.
References
The code references several utility functions from cassiopeia.data.utilities
for tasks like converting between tree formats, computing LCA characters, and calculating dissimilarity maps. It also uses the Layers
class from cassiopeia.data.Layers
to manage multiple versions of the character matrix.
Symbols
CassiopeiaTree
Description
This class represents a phylogenetic tree of a clonal population. It stores the tree topology, character matrix, metadata, and provides methods for manipulating the tree and accessing its properties.
Inputs
Name | Type | Description |
---|---|---|
character_matrix | Optional[pd.DataFrame] | Character matrix of mutation observations. |
missing_state_indicator | int | An indicator for missing states in the character matrix. Defaults to -1. |
cell_meta | Optional[pd.DataFrame] | Per-cell metadata. |
character_meta | Optional[pd.DataFrame] | Per-character metadata. |
priors | Optional[Dict[int, Dict[int, float]]] | A dictionary storing the probability of each character mutating to a particular state. |
tree | Optional[Union[str, ete3.Tree, nx.DiGraph]] | A tree for the lineage, specified as a Networkx DiGraph, a newick string, or an ete3 Tree. |
dissimilarity_map | Optional[pd.DataFrame] | An NxN dataframe storing the pairwise dissimilarities between samples. |
parameters | Optional[Dict[str, Any]] | A dictionary storing parameters related to the tree. |
root_sample_name | Optional[str] | The name of the sample to treat as the root. |
Outputs
The class itself is the output, representing a populated CassiopeiaTree object.
Internal Logic
The class initializes its attributes and optionally populates the tree if provided. It also sets up a cache for storing computed values. The class methods provide various functionalities for manipulating the tree, reconstructing ancestral characters, computing dissimilarities, and accessing tree properties.
Side Effects
The class methods can modify the internal state of the CassiopeiaTree
object, including the tree topology, character matrix, metadata, and dissimilarity map.
Dependencies
Dependency | Purpose |
---|---|
pandas | Data manipulation and storage |
numpy | Numerical operations |
networkx | Tree representation and manipulation |
ete3 | Tree parsing and manipulation |
scipy | Scientific computing |
collections | Data structures |
copy | Deep copying |
warnings | Warning handling |
typing | Type hinting |
cassiopeia.data.utilities | Utility functions for data manipulation |
cassiopeia.data.Layers | Layers class for managing character matrix versions |
cassiopeia.mixins | Mixins for error and warning handling |
cassiopeia.solver.solver_utilities | Utility functions for solvers |
Error Handling
The code raises CassiopeiaTreeError
for various invalid inputs or operations, such as an uninitialized tree, mismatched character matrix and leaves, or negative branch lengths.
TODOs
- Add check upon initialization that input tree is valid tree.
- Add experimental meta data as arguments.
- Add utility methods to compute the colless index and the cophenetic correlation wrt to some cell meta item
- Add bulk set_states method.
- Add boolean to
get_tree_topology
which will include all attributes (e.g., node times)