High-level description

The CassiopeiaTree class is the fundamental data structure in Cassiopeia, representing a phylogenetic tree of a clonal population. It stores a tree topology, a character matrix with mutation states for each cell, and metadata associated with cells and characters. The class provides methods for manipulating the tree, reconstructing ancestral characters, computing dissimilarities, and accessing various tree properties.

Code Structure

The CassiopeiaTree class is the main symbol in the code. It holds references to other data structures like character_matrix, cell_meta, character_meta, priors, and dissimilarity_map. The class methods operate on these data structures and the underlying tree topology represented as a Networkx DiGraph.

References

The code references several utility functions from cassiopeia.data.utilities for tasks like converting between tree formats, computing LCA characters, and calculating dissimilarity maps. It also uses the Layers class from cassiopeia.data.Layers to manage multiple versions of the character matrix.

Symbols

CassiopeiaTree

Description

This class represents a phylogenetic tree of a clonal population. It stores the tree topology, character matrix, metadata, and provides methods for manipulating the tree and accessing its properties.

Inputs

NameTypeDescription
character_matrixOptional[pd.DataFrame]Character matrix of mutation observations.
missing_state_indicatorintAn indicator for missing states in the character matrix. Defaults to -1.
cell_metaOptional[pd.DataFrame]Per-cell metadata.
character_metaOptional[pd.DataFrame]Per-character metadata.
priorsOptional[Dict[int, Dict[int, float]]]A dictionary storing the probability of each character mutating to a particular state.
treeOptional[Union[str, ete3.Tree, nx.DiGraph]]A tree for the lineage, specified as a Networkx DiGraph, a newick string, or an ete3 Tree.
dissimilarity_mapOptional[pd.DataFrame]An NxN dataframe storing the pairwise dissimilarities between samples.
parametersOptional[Dict[str, Any]]A dictionary storing parameters related to the tree.
root_sample_nameOptional[str]The name of the sample to treat as the root.

Outputs

The class itself is the output, representing a populated CassiopeiaTree object.

Internal Logic

The class initializes its attributes and optionally populates the tree if provided. It also sets up a cache for storing computed values. The class methods provide various functionalities for manipulating the tree, reconstructing ancestral characters, computing dissimilarities, and accessing tree properties.

Side Effects

The class methods can modify the internal state of the CassiopeiaTree object, including the tree topology, character matrix, metadata, and dissimilarity map.

Dependencies

DependencyPurpose
pandasData manipulation and storage
numpyNumerical operations
networkxTree representation and manipulation
ete3Tree parsing and manipulation
scipyScientific computing
collectionsData structures
copyDeep copying
warningsWarning handling
typingType hinting
cassiopeia.data.utilitiesUtility functions for data manipulation
cassiopeia.data.LayersLayers class for managing character matrix versions
cassiopeia.mixinsMixins for error and warning handling
cassiopeia.solver.solver_utilitiesUtility functions for solvers

Error Handling

The code raises CassiopeiaTreeError for various invalid inputs or operations, such as an uninitialized tree, mismatched character matrix and leaves, or negative branch lengths.

TODOs

  • Add check upon initialization that input tree is valid tree.
  • Add experimental meta data as arguments.
  • Add utility methods to compute the colless index and the cophenetic correlation wrt to some cell meta item
  • Add bulk set_states method.
  • Add boolean to get_tree_topology which will include all attributes (e.g., node times)