SequentialLineageTracingDataSimulator.py
High-level description
The SequentialLineageTracingDataSimulator
class simulates lineage tracing data generated by sequential Cas9-based technologies, such as the DNA Typewriter. It overlays simulated edits onto a CassiopeiaTree
, mimicking the sequential recording process of these technologies. The simulator considers factors like initiation and continuation rates of Cas9 recording, cassette architecture, state distributions, and silencing rates to generate realistic data.
Code Structure
The SequentialLineageTracingDataSimulator
class inherits from the LineageTracingDataSimulator
class. It primarily implements the overlay_data
method, which simulates the sequential editing process on the provided CassiopeiaTree
. The class also includes helper functions like edit_site
and silence_cassettes
to manage individual editing events and cassette silencing, respectively.
References
cassiopeia.data.CassiopeiaTree
: The simulator operates on aCassiopeiaTree
object, modifying its character states to represent the simulated lineage tracing data.cassiopeia.simulator.LineageTracingDataSimulator
: This class inherits from theLineageTracingDataSimulator
class, providing a framework for simulating lineage tracing data.
Symbols
SequentialLineageTracingDataSimulator
Description
This class simulates sequential Cas9-based lineage tracing data and overlays it onto a CassiopeiaTree
. It models the sequential editing process on a “DNA tape” or “cassette” where only one site can be edited at a time.
Inputs
Name | Type | Description |
---|---|---|
number_of_cassettes | int | Number of cassettes in the system. |
size_of_cassette | int | Number of editable target sites per cassette. |
initiation_rate | float | Exponential parameter for the Cas9 initiation rate. |
continuation_rate | float | Exponential parameter for the Cas9 continuation rate. |
state_priors | Dict[int, float] | Dictionary mapping states to their prior probabilities. |
heritable_silencing_rate | float | Silencing rate for the cassettes, simulating heritable missing data events. |
stochastic_silencing_rate | float | Rate at which to randomly drop out cassettes, simulating dropout due to low sensitivity of assays. |
heritable_missing_data_state | int | Integer representing data that has gone missing due to a heritable event. |
stochastic_missing_data_state | int | Integer representing data that has gone missing due to stochastic dropout. |
random_seed | Optional[int] | Numpy random seed for deterministic simulations. |
Outputs
This class doesn’t directly return any output. It modifies the provided CassiopeiaTree
in place.
Internal Logic
The simulator initializes a character matrix representing the cassettes and their states. It then iterates through each node in the tree, simulating the editing process based on the node’s lineage and lifetime. For each cassette, it determines if it’s initiated and simulates edits based on the continuation rate. It also applies heritable and stochastic silencing to the cassettes, mimicking real-world data variability. Finally, it updates the CassiopeiaTree
with the simulated character matrix.
overlay_data
Description
This method overlays the simulated Cas9-based lineage tracing data onto the provided CassiopeiaTree
.
Inputs
Name | Type | Description |
---|---|---|
tree | CassiopeiaTree | The CassiopeiaTree object to overlay the simulated data onto. |
Outputs
This method doesn’t return any output. It modifies the input CassiopeiaTree
in place.
Internal Logic
The method first initializes a character matrix with all sites set to an unedited state. It then traverses the tree in depth-first order. For each node, it simulates the Cas9 editing process based on the node’s lifetime and the parent’s character state. It then applies heritable and stochastic silencing to the character array. Finally, it updates the CassiopeiaTree
with the simulated character matrix.
edit_site
Description
This helper function edits a specific site in the character array based on the provided state priors.
Inputs
Name | Type | Description |
---|---|---|
character_array | List[int] | The character array representing the cassette states. |
site | int | The index of the site to edit. |
state_priors | Dict[int, float] | Dictionary mapping states to their prior probabilities. |
Outputs
Name | Type | Description |
---|---|---|
character_array | List[int] | The updated character array with the edited site. |
Internal Logic
The function randomly samples a state from the state priors based on their probabilities and updates the character array at the specified site with the chosen state.
silence_cassettes
Description
This helper function simulates the silencing of cassettes in the character array based on the provided silencing rate.
Inputs
Name | Type | Description |
---|---|---|
character_array | List[int] | The character array representing the cassette states. |
silencing_rate | float | The probability of silencing a cassette. |
missing_state | int | The state to use for representing silenced cassettes. |
Outputs
Name | Type | Description |
---|---|---|
updated_character_array | List[int] | The updated character array with silenced cassettes. |
Internal Logic
The function iterates through each cassette and, based on the silencing rate, randomly determines whether to silence it. If a cassette is silenced, all its sites in the character array are set to the specified missing state.