UniformLeafSubsampler.py
High-level description
The UniformLeafSubsampler
class is a subclass of LeafSubsampler
that performs uniform random sampling of leaves in a CassiopeiaTree
. It creates a new CassiopeiaTree
containing only the lineages of the sampled leaves, preserving character states, metadata, and dissimilarity maps for the sampled cells.
References
This code references the following symbols:
cassiopeia.data.CassiopeiaTree
cassiopeia.simulator.LeafSubsampler.LeafSubsampler
cassiopeia.simulator.LeafSubsampler.LeafSubsamplerError
Symbols
UniformLeafSubsampler
Description
This class implements the logic for uniformly subsampling leaves from a CassiopeiaTree
. It provides options to specify the sample size either as a ratio of the total number of leaves or as an explicit number.
Inputs
Name | Type | Description |
---|---|---|
ratio | Optional[float] | The proportion of leaves to sample. |
number_of_leaves | Optional[int] | The exact number of leaves to sample. |
Outputs
This class doesn’t directly return any values. It modifies the input CassiopeiaTree
object.
Internal Logic
The __init__
method initializes the UniformLeafSubsampler
object, ensuring that either ratio
or number_of_leaves
is provided, but not both.
The subsample_leaves
method performs the actual subsampling. It first determines the desired sample size based on the provided ratio
or number_of_leaves
. Then, it randomly selects leaves to remove and prunes the tree accordingly. Finally, it optionally collapses any remaining unifurcations (nodes with a single child) to maintain a valid tree structure.
Side Effects
- Modifies the input
CassiopeiaTree
object in place.
Performance Considerations
The performance of this class depends on the size of the input tree and the desired sample size. The random selection of leaves and tree pruning operations have a time complexity that scales with the number of nodes and edges in the tree.