High-level description

The ecDNABirthDeathSimulator class simulates the evolution of a cell population with extrachromosomal DNA (ecDNA) using a forward birth-death process. It extends the BirthDeathFitnessSimulator class to incorporate ecDNA-specific dynamics, including co-segregation and fitness effects based on ecDNA copy number. The simulator generates a phylogenetic tree representing the evolutionary history of the cell population, along with metadata about ecDNA copy numbers and observed ecDNA counts in each cell.

Code Structure

The ecDNABirthDeathSimulator class inherits from the BirthDeathFitnessSimulator class. It overrides several methods to incorporate ecDNA-specific logic, including initialize_tree, update_fitness, sample_lineage_event, and get_ecdna_array. The populate_tree_from_simulation method is used to create a CassiopeiaTree object from the simulated tree, including metadata about ecDNA copy numbers and observed counts.

References

  • BirthDeathFitnessSimulator: The parent class that provides the basic framework for simulating a birth-death process with fitness.
  • CassiopeiaTree: The data structure used to represent the simulated phylogenetic tree and associated metadata.

Symbols

ecDNABirthDeathSimulator

Description

This class simulates the evolution of a cell population with ecDNA using a forward birth-death process. It allows for various birth and death waiting time distributions, fitness regimes, and ecDNA co-segregation patterns.

Inputs

NameTypeDescription
birth_waiting_distributionCallable[[float], float]A function that samples waiting times from the birth distribution.
initial_birth_scalefloatThe initial scale parameter for the birth distribution.
death_waiting_distributionOptional[Callable[[], float]]A function that samples waiting times from the death distribution. Defaults to no death.
mutation_distributionOptional[Callable[[], int]]A function that samples the number of mutations occurring at a division event. Defaults to no mutations.
fitness_distributionOptional[Callable[[], float]]A function that samples the exponential that the fitness base is raised by. Determines the distribution of fitness mutation strengths.
fitness_basefloatThe base that is raised by the value given by the fitness distribution. Determines the base strength of fitness mutations. Defaults to Euler’s constant.
num_extantOptional[int]The number of extant lineages at which to stop the simulation.
experiment_timeOptional[float]The time at which to stop the simulation.
collapse_unifurcationsboolWhether to collapse unifurcations in the resulting tree. Defaults to True.
prune_dead_lineagesboolWhether to prune dead lineages from the resulting tree. Defaults to True.
random_seedintA seed for reproducibility.
initial_copy_numbernp.arrayThe initial copy number of each ecDNA species in the root cell. Defaults to [1].
cosegregation_coefficientfloatA coefficient describing the likelihood of co-segregation between ecDNA species. Defaults to 0.0.
splitting_functionCallable[[int], int]A function that describes the segregation of ecDNA at cell division. Defaults to a binomial distribution with p=0.5.
fitness_arraynp.arrayA matrix defining the fitness of a cell based on the presence or absence of each ecDNA species.
fitness_functionOptional[Callable[[int, int, float], float]]A function that calculates the fitness of a cell based on ecDNA copy numbers and the fitness array.
capture_efficiencyfloatThe probability of observing an ecDNA species in a cell. Defaults to 1.0.
initial_treeOptional[CassiopeiaTree]A tree used to initialize the simulation.

Outputs

A CassiopeiaTree object representing the simulated phylogenetic tree with ecDNA copy number and observation data.

Internal Logic

The simulator works by maintaining a priority queue of lineages, sorted by their next event time. At each step, the lineage with the earliest event time is popped from the queue. If the event is a birth, a new lineage is created with updated fitness and ecDNA copy number, and both lineages are added back to the queue. If the event is a death, the lineage is removed from the queue. The simulation continues until one of the stopping conditions is met.

initialize_tree

Description

Initializes the tree with a single root node or from a provided initial tree.

Inputs

NameTypeDescription
namesGeneratorA generator for unique node names.

Outputs

An nx.DiGraph object representing the initialized tree.

Internal Logic

If an initial tree is provided, the method copies its topology and attributes. Otherwise, it creates a new tree with a single root node and sets its initial attributes.

update_fitness

Description

Calculates the fitness of a lineage based on its ecDNA copy number.

Inputs

NameTypeDescription
ecdna_arraynp.arrayThe ecDNA copy number array of the lineage.

Outputs

The updated birth scale parameter representing the lineage’s fitness.

Internal Logic

The method calculates the fitness based on the provided fitness_array and fitness_function. If no fitness_function is provided, it uses a simple additive model based on the presence or absence of each ecDNA species.

sample_lineage_event

Description

Simulates the next event (birth or death) for a given lineage.

Inputs

NameTypeDescription
lineageDict[str, Union[int, float]]A dictionary representing the lineage.
current_lineagesPriorityQueueThe priority queue of active lineages.
treenx.DiGraphThe tree being constructed.
namesGeneratorA generator for unique node names.
observed_nodesList[str]A list of observed nodes.

Outputs

None. The method modifies the tree, current_lineages, and observed_nodes in place.

Internal Logic

The method samples birth and death waiting times for the lineage. If the birth time is less than the death time, a new lineage is created with updated fitness and ecDNA copy number, and both lineages are added to the queue. If the death time is less than the birth time, the lineage is marked as inactive and added to the queue. If both times exceed the experiment time, the lineage is cut off at the experiment time and marked as observed.

get_ecdna_array

Description

Generates the ecDNA copy number array for a child cell given its parent’s copy number and co-segregation parameters.

Inputs

NameTypeDescription
parent_idstrThe ID of the parent node in the tree.
treenx.DiGraphThe tree being constructed.

Outputs

A np.array representing the ecDNA copy number array for the child cell.

Internal Logic

The method first doubles the parent’s ecDNA copy number array. If the parent already has a child, it subtracts the child’s copy number from the doubled array. Otherwise, it uses the splitting_function and cosegregation_coefficient to simulate the segregation of ecDNA into the child cell.

Side Effects

The ecDNABirthDeathSimulator modifies the tree, current_lineages, and observed_nodes objects in place during the simulation.

Dependencies

  • networkx: Used for representing and manipulating the tree topology.
  • numpy: Used for numerical operations and random number generation.
  • pandas: Used for storing and manipulating cell metadata.
  • queue: Used for the priority queue of active lineages.

Error Handling

The ecDNABirthDeathSimulator raises TreeSimulatorError exceptions for various invalid input conditions, such as specifying no stopping conditions or a negative number of mutations. It also raises ecDNABirthDeathSimulatorError if a child’s ecDNA copy number exceeds its parent’s.