High-level description

This file, critique_utilities.py, provides utility functions for the critique module in the Cassiopeia library. These functions are primarily used for analyzing and comparing phylogenetic trees, particularly in the context of lineage tracing experiments. They include functions for calculating combinatorial terms, annotating tree depths, identifying outgroups in triplets, and sampling triplets from a tree.

Code Structure

The code consists of several independent functions that operate on CassiopeiaTree objects. These functions are used for tasks such as annotating tree depths, sampling triplets, and identifying outgroups.

Symbols

nCr

Description

This function calculates the binomial coefficient, which represents the number of ways to choose r items from a set of n items without replacement.

Inputs

NameTypeDescription
nintThe total number of items.
rintThe number of items to choose.

Outputs

NameTypeDescription
nCrfloatThe binomial coefficient, representing the number of combinations.

Internal Logic

The function first checks for invalid input values (r > n or n < 0 or r < 0) and returns 0 if any are found. Otherwise, it calculates the factorial of n, r, and (n-r) using the math.factorial function and returns their quotient as the binomial coefficient.

annotate_tree_depths

Description

This function annotates each node in a CassiopeiaTree with its depth from the root and the number of triplets rooted at that node.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe input tree to annotate.

Outputs

NameTypeDescription
depth_to_nodesdefaultdict(list)A dictionary mapping each depth to a list of nodes at that depth.

Internal Logic

The function performs a depth-first traversal of the tree. For each node, it calculates its depth based on its parent’s depth and stores it as a node attribute named “depth”. It also calculates the number of triplets rooted at that node by subtracting the number of triplets rooted at its children from the total number of triplets possible with the leaves under the current node. This information is stored as a node attribute named “number_of_triplets”.

get_outgroup

Description

This function infers the outgroup of a given triplet of leaves in a CassiopeiaTree.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe input tree.
tripletTuple[str, str, str]A tuple containing the names of the three leaves forming the triplet.

Outputs

NameTypeDescription
out_groupstrThe name of the outgroup leaf, or “None” if the triplet is unresolved.

Internal Logic

The function determines the outgroup by finding the pair of leaves within the triplet that share the most recent common ancestor (MRCA). The leaf that is not part of this pair is considered the outgroup. The MRCA is inferred based on the number of shared ancestors between each pair of leaves.

sample_triplet_at_depth

Description

This function samples a triplet of leaves from a CassiopeiaTree such that the most recent common ancestor (MRCA) of the triplet is at a specified depth.

Inputs

NameTypeDescription
treeCassiopeiaTreeThe input tree.
depthintThe desired depth for the MRCA of the sampled triplet.
depth_to_nodesOptional[Dict[int, List[str]]]An optional dictionary mapping depths to lists of nodes at that depth. Providing this dictionary can speed up the function.

Outputs

NameTypeDescription
(triplet, out_group)Tuple[Tuple[str, str, str], str]A tuple containing the sampled triplet (as a tuple of leaf names) and the name of the outgroup leaf.

Internal Logic

The function first identifies the nodes at the specified depth. It then samples a node from these candidates with probability proportional to the number of triplets rooted at each node. Once a node is chosen, the function samples three of its descendant clades, ensuring that at least two clades are distinct. Finally, it randomly selects one leaf from each of the chosen clades to form the triplet and identifies the outgroup based on the chosen clades.