High-level description

The cassiopeia/critique directory contains modules for comparing and analyzing phylogenetic trees, particularly in the context of lineage tracing experiments. It provides implementations for calculating tree similarity metrics, such as the Robinson-Foulds distance and triplets correct accuracy, as well as utility functions for tree manipulation and analysis.

What does it do?

This directory provides tools for researchers and developers working with phylogenetic trees to:

  1. Compare two phylogenetic trees using different metrics:

    • Robinson-Foulds distance: A standard metric for measuring the structural difference between two trees.
    • Triplets correct accuracy: A more detailed comparison that examines the topology of sampled triplets (sets of three leaves) at different depths of the trees.
  2. Perform various tree analysis tasks:

    • Annotate tree depths and calculate the number of triplets rooted at each node.
    • Identify outgroups in triplets of leaves.
    • Sample triplets from specific depths in a tree.
  3. Provide utility functions for combinatorial calculations and tree manipulations.

These tools are essential for validating and comparing reconstructed phylogenetic trees, which is crucial in lineage tracing experiments and other applications involving evolutionary tree analysis.

Entry points

The main entry point for this directory is the __init__.py file, which exposes two key functions:

  1. robinson_foulds: Computes the Robinson-Foulds distance between two phylogenetic trees.
  2. triplets_correct: Calculates the triplets correct accuracy between two phylogenetic trees.

These functions are imported from the compare.py module, which contains their implementations. The critique_utilities.py file provides supporting functions used by the main comparison algorithms.

The typical workflow would involve:

  1. Importing the desired comparison function from cassiopeia.critique.
  2. Creating or loading two CassiopeiaTree objects to compare.
  3. Calling the comparison function with the two trees as arguments.
  4. Analyzing the results to determine the similarity or differences between the trees.

Key Files

  1. compare.py: This file contains the main implementations of the tree comparison algorithms:

    • triplets_correct: A detailed comparison function that samples triplets at different depths and compares their topology between two trees.
    • robinson_foulds: A wrapper around the Ete3 library’s implementation of the Robinson-Foulds distance calculation.
  2. critique_utilities.py: This file provides utility functions used in tree analysis and comparison:

    • nCr: Calculates binomial coefficients.
    • annotate_tree_depths: Annotates each node in a tree with its depth and the number of triplets rooted at that node.
    • get_outgroup: Infers the outgroup of a given triplet of leaves in a tree.
    • sample_triplet_at_depth: Samples a triplet of leaves from a tree with a specified most recent common ancestor depth.

These files work together to provide a comprehensive set of tools for tree comparison and analysis. The compare.py file relies on the utility functions in critique_utilities.py to perform its calculations efficiently.

Dependencies

The critique module relies on several external libraries and internal Cassiopeia components:

  1. External libraries:

    • collections: Used for defaultdict data structure.
    • copy: Used for deep copying trees.
    • ete3: Used for Robinson-Foulds distance calculation.
    • networkx: Likely used in the CassiopeiaTree implementation.
    • numpy: Used for numerical operations.
    • typing: Used for type hinting.
    • math: Used for factorial calculations in combinatorial functions.
  2. Internal dependencies:

    • cassiopeia.data.CassiopeiaTree: The main data structure representing phylogenetic trees.

These dependencies are chosen to provide efficient data structures (e.g., defaultdict), numerical operations (numpy), and specialized tree operations (ete3). The use of type hinting (typing) suggests a focus on code clarity and potential use of static type checking tools.

In conclusion, the cassiopeia/critique directory provides a set of powerful tools for comparing and analyzing phylogenetic trees, with a focus on efficiency and accuracy. It leverages both standard Python libraries and specialized scientific computing packages to deliver robust tree comparison functionality, which is crucial for validating and interpreting results in lineage tracing experiments and other phylogenetic studies.