compare.py
Here’s a comprehensive documentation for the compare.py
file in the cassiopeia/critique
module:
High-level description
This module provides functions for comparing two phylogenetic trees. It includes implementations for calculating the triplets correct accuracy and the Robinson-Foulds distance between two trees.
Code Structure
The module contains two main functions: triplets_correct
and robinson_foulds
. These functions operate on CassiopeiaTree
objects, which represent phylogenetic trees. The triplets_correct
function is more complex, involving sampling and comparison of triplets at different depths of the trees, while the robinson_foulds
function is a wrapper around the Ete3 library’s implementation of the Robinson-Foulds distance.
Symbols
triplets_correct
Description
This function calculates the triplets correct accuracy between two trees. It samples triplets (sets of three leaves) at each depth of the tree and compares their topology between the two input trees.
Inputs
Name | Type | Description |
---|---|---|
tree1 | CassiopeiaTree | The first input tree |
tree2 | CassiopeiaTree | The second input tree to be compared |
number_of_trials | int | Number of triplets to sample at each depth (default: 1000) |
min_triplets_at_depth | int | Minimum number of triplets needed at a depth for inclusion (default: 1) |
Outputs
Name | Type | Description |
---|---|---|
all_triplets_correct | Dict[int, float] | The proportion of all triplets correct at each depth |
resolvable_triplets_correct | Dict[int, float] | The proportion of resolvable triplets correct at each depth |
unresolved_triplets_correct | Dict[int, float] | The proportion of unresolvable triplets correct at each depth |
proportion_unresolvable | Dict[int, float] | The proportion of unresolvable triplets at each depth |
Internal Logic
- Initialize dictionaries to store results
- Create copies of input trees and collapse unifurcations
- Annotate tree depths and compute number of triplets at each depth
- For each depth:
- Sample triplets and compare their topology between the two trees
- Calculate various statistics (all, resolvable, unresolved triplets correct)
- Return the computed statistics
robinson_foulds
Description
This function computes the Robinson-Foulds distance between two trees. It’s a wrapper around the Ete3 library’s implementation.
Inputs
Name | Type | Description |
---|---|---|
tree1 | CassiopeiaTree | The first input tree |
tree2 | CassiopeiaTree | The second input tree to be compared |
Outputs
Name | Type | Description |
---|---|---|
rf | float | The Robinson-Foulds distance between the two trees |
rf_max | float | The maximum possible Robinson-Foulds distance for these trees |
Internal Logic
- Create copies of input trees and collapse unifurcations
- Convert trees to Ete3 format
- Compute Robinson-Foulds distance using Ete3’s implementation
- Return the distance and maximum possible distance
Dependencies
Dependency | Purpose |
---|---|
collections | Used for defaultdict |
copy | Used for deep copying trees |
ete3 | Used for Robinson-Foulds distance calculation |
networkx | Not directly used in this file, but likely used in CassiopeiaTree |
numpy | Used for numerical operations |
typing | Used for type hinting |
cassiopeia.critique.critique_utilities | Used for utility functions |
cassiopeia.data.CassiopeiaTree | Used as the main data structure for trees |
Error Handling
The code does not implement explicit error handling. It relies on the underlying libraries and data structures to raise appropriate exceptions.
This module provides essential functionality for comparing phylogenetic trees in the Cassiopeia project. The triplets_correct
function offers a detailed comparison of tree topologies at different depths, while the robinson_foulds
function provides a standard distance metric for tree comparison. These functions are likely used in various analysis and validation tasks within the larger Cassiopeia framework.