Overview
High-level description
This directory contains the core components of the Cassiopeia project, a comprehensive framework for single-cell lineage tracing and phylogenetic analysis. The project includes tools for data preprocessing, tree reconstruction, simulation, visualization, and various analytical utilities.
What does it do?
Cassiopeia provides an end-to-end pipeline for analyzing single-cell lineage tracing experiments:
-
Preprocessing: Converts raw sequencing data into formats suitable for phylogenetic analysis, including filtering, error correction, and allele calling.
-
Tree Reconstruction: Implements various algorithms (greedy, distance-based, spectral, and integer linear programming) to reconstruct phylogenetic trees from mutation data.
-
Simulation: Generates synthetic phylogenetic trees and associated data for testing and validating analysis methods.
-
Visualization: Offers both local and cloud-based options for visualizing phylogenetic trees and associated data.
-
Analysis Tools: Provides utilities for parameter estimation, fitness calculation, evolutionary coupling analysis, and various tree metrics.
-
Tree Comparison: Implements methods for comparing phylogenetic trees, such as Robinson-Foulds distance and triplets correct accuracy.
Entry points
The main entry points for the Cassiopeia package are:
cassiopeia/preprocess
: For preprocessing raw sequencing data.cassiopeia/solver
: For reconstructing phylogenetic trees from processed data.cassiopeia/simulator
: For generating synthetic phylogenetic data.cassiopeia/plotting
: For visualizing phylogenetic trees and associated data.cassiopeia/tools
: For various analytical tools and utilities.cassiopeia/critique
: For comparing and analyzing phylogenetic trees.
The cassiopeia/__init__.py
file serves as the main interface, importing and exposing key functionalities from these submodules.
Key Files
cassiopeia/data/CassiopeiaTree.py
: Defines the core data structure for representing phylogenetic trees with associated mutation data.cassiopeia/solver/CassiopeiaSolver.py
: Provides the base class for tree reconstruction algorithms.cassiopeia/simulator/TreeSimulator.py
: Offers the base class for tree simulation models.cassiopeia/plotting/local.py
andcassiopeia/plotting/itol_utilities.py
: Implement local and cloud-based tree visualization.cassiopeia/tools/parameter_estimators.py
: Contains functions for estimating key parameters from tree data.cassiopeia/critique/compare.py
: Implements tree comparison algorithms.build.py
: Handles the building and compilation of Cython extensions for performance-critical components.README.md
: Provides an overview of the project, installation instructions, and links to documentation and tutorials.
Dependencies
Cassiopeia relies on several external libraries:
- numpy and pandas: For data manipulation and numerical computations.
- networkx: For graph operations and tree manipulations.
- scipy: For various scientific computing tasks.
- ete3: For phylogenetic tree manipulation and visualization.
- matplotlib and plotly: For local plotting.
- pysam: For handling sequencing data formats.
- cvxpy and gurobipy: For optimization problems in some solvers.
- Cython: For compiling performance-critical components.
Configuration
Cassiopeia uses various configuration methods:
- Preprocessing pipeline: Configured using an INI-format file specifying parameters for each preprocessing step.
- Solvers: Customizable through parameters passed during initialization.
- Simulators: Configurable via parameters set during object creation.
- Visualization: Customizable through function parameters for both local and cloud-based plotting.
pyproject.toml
: Defines project metadata, dependencies, and build settings..readthedocs.yml
: Configures the documentation build process on Read the Docs.codecov.yml
: Sets coverage requirements for the project.
The project also includes a comprehensive test suite in the test
directory, ensuring the reliability and accuracy of the framework across its various modules and functionalities.
In summary, Cassiopeia provides a robust and flexible framework for analyzing single-cell lineage tracing data, offering tools for every step from raw data processing to advanced phylogenetic analysis and visualization.