High-level description

The SpatialLeafSubsampler class is a subclass of LeafSubsampler that subsamples leaves within a specified region of interest in a CassiopeiaTree with spatial information. It supports subsampling using a bounding box or a numpy mask, and allows for downsampling by ratio or a fixed number of leaves.

Code Structure

The SpatialLeafSubsampler class has a single method, subsample_leaves, which takes a CassiopeiaTree as input and returns a new CassiopeiaTree containing the subsampled leaves. The class constructor initializes the region of interest and downsampling parameters.

References

  • CassiopeiaTree: The SpatialLeafSubsampler operates on CassiopeiaTree objects.
  • LeafSubsampler: The SpatialLeafSubsampler is a subclass of LeafSubsampler.

Symbols

SpatialLeafSubsampler

Description

This class subsamples leaves within a region of interest of a CassiopeiaTree with spatial information and produces a new CassiopeiaTree containing the sampled leaves.

Inputs

NameTypeDescription
bounding_boxOptional[List[tuple]]A list of tuples of the form (min, max) for each dimension of the bounding box.
spaceOptional[np.ndarray]Numpy array mask representing the space that cells will be sampled from.
ratioOptional[float]Specifies the number of leaves to be sampled as a ratio of the total number of leaves in the region of interest.
number_of_leavesOptional[int]Explicitly specifies the number of leaves to be sampled.
attribute_keyOptional[str]The key in the CassiopeiaTree’s node attributes that contains the spatial coordinates of the leaves. Defaults to “spatial”.
merge_cellsOptional[bool]Whether or not to merge cells contained within the same pixel in the space. Defaults to False.

Outputs

None

Internal Logic

The constructor checks for valid input parameters and stores them as instance variables.

subsample_leaves

Description

This method subsamples the leaves of a CassiopeiaTree to those within a region of interest and returns a tree pruned to contain lineages relevant to only leaves in the sample (the “induced subtree” on the sample).

Inputs

NameTypeDescription
treeCassiopeiaTreeThe CassiopeiaTree for which to subsample leaves.
keep_singular_root_edgeOptional[bool]Whether or not to collapse the single edge leading from the root in the subsample, if it exists. Defaults to True.
collapse_duplicatesboolWhether or not to collapse duplicated character states, so that only unique character states are present in each ambiguous state. Defaults to True.

Outputs

NameTypeDescription
subsampled_treeCassiopeiaTreeA new CassiopeiaTree that is the induced subtree on a sample of the leaves in the given tree.

Internal Logic

  1. Check for spatial information: Ensures the tree has spatial information associated with the leaves.
  2. Subset leaves:
    • If bounding_box is provided, keeps leaves within the specified bounding box.
    • If space is provided, keeps leaves within the specified mask.
  3. Determine number of leaves to sample:
    • If ratio is provided, calculates the number of leaves to sample based on the ratio and the number of leaves in the region of interest.
    • If number_of_leaves is provided, uses the specified number.
    • If neither is provided, keeps all leaves in the region of interest.
  4. Sample leaves: Randomly samples the specified number of leaves from the subset of leaves within the region of interest.
  5. Prune lineages: Removes leaves not in the sample and prunes lineages no longer relevant to the sampled leaves.
  6. Merge cells (optional): If merge_cells is True, merges cells within the same pixel in the space, creating a new cell with ambiguous character states.
  7. Collapse unifurcations: Collapses unifurcations in the tree, preserving node times.
  8. Copy and annotate branch lengths and times: Copies branch lengths and times from the original tree to the subsampled tree.

Side Effects

The subsample_leaves method modifies the input CassiopeiaTree object in place if copy is set to False.

Error Handling

The constructor and subsample_leaves method raise LeafSubsamplerError for invalid input parameters or if the tree does not have the required spatial information.

Dependencies

  • numpy: Used for array operations and random sampling.
  • pandas: Used for character matrix manipulation.
  • networkx: Used for tree manipulation.
  • collections: Used for creating a defaultdict.
  • copy: Used for deep copying the tree.
  • warnings: Used for issuing warnings.

Configuration

The SpatialLeafSubsampler class is configured using the parameters passed to its constructor.

Logging

The code does not implement any specific logging mechanisms.

API/Interface Reference

The SpatialLeafSubsampler class exposes a single public method, subsample_leaves, which can be used to subsample the leaves of a CassiopeiaTree object.