High-level description

The cassiopeia/simulator directory contains a collection of classes and modules for simulating various aspects of single-cell lineage tracing experiments. These simulators can generate synthetic phylogenetic trees, spatial data, lineage tracing data, and perform leaf subsampling. The main components include:

  1. Tree simulators (e.g., BirthDeathFitnessSimulator, CompleteBinarySimulator)
  2. Data simulators (e.g., Cas9LineageTracingDataSimulator, BrownianSpatialDataSimulator)
  3. Leaf subsamplers (e.g., UniformLeafSubsampler, SpatialLeafSubsampler)
  4. Specialized simulators (e.g., ecDNABirthDeathSimulator)

These simulators provide a framework for generating synthetic data to test and validate lineage reconstruction algorithms and analysis methods.

What does it do?

The cassiopeia/simulator directory provides tools to:

  1. Generate synthetic phylogenetic trees with various growth models and fitness effects.
  2. Simulate lineage tracing data, including Cas9-based editing and sequential recording.
  3. Create spatial data for cells in a lineage tree.
  4. Subsample leaves from a tree to mimic experimental sampling or create supercellular states.
  5. Simulate specialized scenarios like extrachromosomal DNA (ecDNA) evolution.

These simulators allow researchers to create realistic in silico datasets that mimic the complexities of real single-cell lineage tracing experiments. By generating ground truth phylogenies and associated data, researchers can evaluate the performance of reconstruction algorithms, test hypotheses about evolutionary processes, and explore the effects of different experimental designs and parameters.

Entry points

The main entry points for using the simulators are:

  1. TreeSimulator abstract base class: This is the starting point for implementing new tree simulation models.
  2. DataSimulator abstract base class: This is the base class for implementing new data simulation models.
  3. LeafSubsampler abstract base class: This is the base class for implementing new leaf subsampling strategies.

Concrete implementations of these base classes, such as BirthDeathFitnessSimulator, Cas9LineageTracingDataSimulator, and UniformLeafSubsampler, provide specific simulation and subsampling functionalities.

The __init__.py file in this directory serves as the main interface for importing and using the various simulator classes.

Key Files

  1. BirthDeathFitnessSimulator.py: Implements a birth-death process with fitness variations for tree simulation.
  2. Cas9LineageTracingDataSimulator.py: Simulates Cas9-based lineage tracing data.
  3. BrownianSpatialDataSimulator.py: Generates spatial data for cells using a Brownian motion model.
  4. UniformLeafSubsampler.py: Implements uniform random subsampling of leaves from a tree.
  5. ecDNABirthDeathSimulator.py: Simulates the evolution of cell populations with extrachromosomal DNA.

These files contain the core implementations of various simulation and subsampling strategies, each addressing different aspects of single-cell lineage tracing experiments.

Dependencies

The simulator modules rely on several external libraries:

  1. networkx: Used for graph operations and tree manipulations.
  2. numpy: Used for numerical computations and random number generation.
  3. pandas: Used for handling data structures like character matrices.
  4. scipy: Used for various scientific computing tasks, including spatial algorithms.
  5. sklearn: Used for nearest neighbor searches in spatial simulations.

Additionally, the simulators depend on other parts of the Cassiopeia package, particularly the cassiopeia.data.CassiopeiaTree class for representing and manipulating phylogenetic trees.

Configuration

Most simulator classes accept various parameters during initialization to configure their behavior. Common configuration options include:

  1. Simulation duration or stopping conditions (e.g., experiment_time, num_extant)
  2. Mutation rates and distributions
  3. Fitness parameters
  4. Spatial simulation parameters (e.g., diffusion_coefficient)
  5. Subsampling ratios or target numbers of leaves

These parameters allow users to fine-tune the simulations to match specific experimental scenarios or test different hypotheses about evolutionary processes.

In conclusion, the cassiopeia/simulator directory provides a comprehensive set of tools for generating synthetic single-cell lineage tracing data, enabling researchers to test and validate analysis methods, explore experimental designs, and gain insights into the underlying biological processes.