High-level description

The cassiopeia directory contains a comprehensive suite of tools and modules for phylogenetic analysis, particularly focused on single-cell lineage tracing experiments. It provides functionality for data preprocessing, tree reconstruction, simulation, visualization, and various analytical tools for working with phylogenetic trees.

What does it do?

The Cassiopeia package offers a wide range of functionalities:

  1. Data Preprocessing: Converts raw sequencing data to formats suitable for phylogenetic analysis, including filtering, error correction, and allele calling.

  2. Tree Reconstruction: Implements various algorithms for reconstructing phylogenetic trees from mutation data, including greedy, distance-based, spectral, and integer linear programming methods.

  3. Simulation: Provides tools to generate synthetic phylogenetic trees and associated data, allowing researchers to test and validate analysis methods.

  4. Visualization: Offers both local and cloud-based options for visualizing phylogenetic trees and associated data.

  5. Analysis Tools: Includes utilities for parameter estimation, fitness calculation, evolutionary coupling analysis, and various tree metrics.

  6. Tree Comparison: Implements methods for comparing phylogenetic trees, such as Robinson-Foulds distance and triplets correct accuracy.

These functionalities enable researchers to perform end-to-end analysis of single-cell lineage tracing data, from raw sequencing results to detailed phylogenetic insights.

Entry points

The main entry points for the Cassiopeia package are:

  1. cassiopeia.preprocess: For preprocessing raw sequencing data.
  2. cassiopeia.solver: For reconstructing phylogenetic trees from processed data.
  3. cassiopeia.simulator: For generating synthetic phylogenetic data.
  4. cassiopeia.plotting: For visualizing phylogenetic trees and associated data.
  5. cassiopeia.tools: For various analytical tools and utilities.
  6. cassiopeia.critique: For comparing and analyzing phylogenetic trees.

The __init__.py file in the root directory serves as the main interface, importing and exposing key functionalities from these submodules.

Key Files

  1. data/CassiopeiaTree.py: Defines the core data structure for representing phylogenetic trees with associated mutation data.
  2. solver/CassiopeiaSolver.py: Provides the base class for tree reconstruction algorithms.
  3. simulator/TreeSimulator.py: Offers the base class for tree simulation models.
  4. plotting/local.py and plotting/itol_utilities.py: Implement local and cloud-based tree visualization.
  5. tools/parameter_estimators.py: Contains functions for estimating key parameters from tree data.
  6. critique/compare.py: Implements tree comparison algorithms.

Dependencies

Cassiopeia relies on several external libraries:

  1. numpy and pandas: For data manipulation and numerical computations.
  2. networkx: For graph operations and tree manipulations.
  3. scipy: For various scientific computing tasks.
  4. ete3: For phylogenetic tree manipulation and visualization.
  5. matplotlib and plotly: For local plotting.
  6. pysam: For handling sequencing data formats.
  7. cvxpy and gurobipy: For optimization problems in some solvers.

Configuration

Cassiopeia uses configuration files and function parameters for customization:

  1. Preprocessing pipeline: Configured using an INI-format file specifying parameters for each preprocessing step.
  2. Solvers: Customizable through parameters passed during initialization.
  3. Simulators: Configurable via parameters set during object creation.
  4. Visualization: Customizable through function parameters for both local and cloud-based plotting.

Users can adjust these configurations to adapt Cassiopeia to their specific experimental setups and analysis requirements.

In summary, Cassiopeia provides a comprehensive framework for analyzing single-cell lineage tracing data, offering tools for every step from raw data processing to advanced phylogenetic analysis and visualization.