Skip to main contentHigh-level description
This directory contains the core components of the Cassiopeia project, a comprehensive framework for single-cell lineage tracing and phylogenetic analysis. The project includes tools for data preprocessing, tree reconstruction, simulation, visualization, and various analytical utilities.
What does it do?
Cassiopeia provides an end-to-end pipeline for analyzing single-cell lineage tracing experiments:
-
Preprocessing: Converts raw sequencing data into formats suitable for phylogenetic analysis, including filtering, error correction, and allele calling.
-
Tree Reconstruction: Implements various algorithms (greedy, distance-based, spectral, and integer linear programming) to reconstruct phylogenetic trees from mutation data.
-
Simulation: Generates synthetic phylogenetic trees and associated data for testing and validating analysis methods.
-
Visualization: Offers both local and cloud-based options for visualizing phylogenetic trees and associated data.
-
Analysis Tools: Provides utilities for parameter estimation, fitness calculation, evolutionary coupling analysis, and various tree metrics.
-
Tree Comparison: Implements methods for comparing phylogenetic trees, such as Robinson-Foulds distance and triplets correct accuracy.
Entry points
The main entry points for the Cassiopeia package are:
cassiopeia/preprocess: For preprocessing raw sequencing data.
cassiopeia/solver: For reconstructing phylogenetic trees from processed data.
cassiopeia/simulator: For generating synthetic phylogenetic data.
cassiopeia/plotting: For visualizing phylogenetic trees and associated data.
cassiopeia/tools: For various analytical tools and utilities.
cassiopeia/critique: For comparing and analyzing phylogenetic trees.
The cassiopeia/__init__.py file serves as the main interface, importing and exposing key functionalities from these submodules.
Key Files
cassiopeia/data/CassiopeiaTree.py: Defines the core data structure for representing phylogenetic trees with associated mutation data.
cassiopeia/solver/CassiopeiaSolver.py: Provides the base class for tree reconstruction algorithms.
cassiopeia/simulator/TreeSimulator.py: Offers the base class for tree simulation models.
cassiopeia/plotting/local.py and cassiopeia/plotting/itol_utilities.py: Implement local and cloud-based tree visualization.
cassiopeia/tools/parameter_estimators.py: Contains functions for estimating key parameters from tree data.
cassiopeia/critique/compare.py: Implements tree comparison algorithms.
build.py: Handles the building and compilation of Cython extensions for performance-critical components.
README.md: Provides an overview of the project, installation instructions, and links to documentation and tutorials.
Dependencies
Cassiopeia relies on several external libraries:
- numpy and pandas: For data manipulation and numerical computations.
- networkx: For graph operations and tree manipulations.
- scipy: For various scientific computing tasks.
- ete3: For phylogenetic tree manipulation and visualization.
- matplotlib and plotly: For local plotting.
- pysam: For handling sequencing data formats.
- cvxpy and gurobipy: For optimization problems in some solvers.
- Cython: For compiling performance-critical components.
Configuration
Cassiopeia uses various configuration methods:
- Preprocessing pipeline: Configured using an INI-format file specifying parameters for each preprocessing step.
- Solvers: Customizable through parameters passed during initialization.
- Simulators: Configurable via parameters set during object creation.
- Visualization: Customizable through function parameters for both local and cloud-based plotting.
pyproject.toml: Defines project metadata, dependencies, and build settings.
.readthedocs.yml: Configures the documentation build process on Read the Docs.
codecov.yml: Sets coverage requirements for the project.
The project also includes a comprehensive test suite in the test directory, ensuring the reliability and accuracy of the framework across its various modules and functionalities.
In summary, Cassiopeia provides a robust and flexible framework for analyzing single-cell lineage tracing data, offering tools for every step from raw data processing to advanced phylogenetic analysis and visualization.