High-level description

This directory contains a comprehensive suite of unit tests for the Cassiopeia project, which is a framework for phylogenetic analysis and lineage tracing. The tests cover various modules and functionalities of the project, including data handling, preprocessing, simulation, solving algorithms, plotting, and utility functions.

What does it do?

The test suite verifies the correctness and reliability of Cassiopeia’s core functionalities:

  1. Data handling: Tests for character matrices, tree topologies, and data structures used in phylogenetic analysis.
  2. Preprocessing: Validates sequence alignment, allele calling, lineage group assignment, and UMI collapsing.
  3. Simulation: Ensures correct functionality of various simulators for tree generation and spatial data.
  4. Solving algorithms: Tests different phylogenetic tree reconstruction methods, including Neighbor Joining, ILP, Greedy, and Spectral solvers.
  5. Plotting: Verifies 2D and 3D plotting capabilities, including integration with iTOL for online visualization.
  6. Utility functions: Tests helper functions for tasks like dissimilarity calculations and parameter estimation.

These tests help maintain the integrity of the Cassiopeia framework, ensuring that researchers can rely on its results for lineage tracing and phylogenetic analyses.

Entry points

The main entry points for developers are the individual test directories and files:

  1. critique_tests: Tests for tree comparison functions.
  2. data_tests: Tests for data handling and manipulation.
  3. mixin_tests: Tests for utility functions in mixins.
  4. plotting_tests: Tests for various plotting functionalities.
  5. preprocess_tests: Tests for the preprocessing pipeline.
  6. simulator_tests: Tests for tree and data simulation.
  7. solver_tests: Tests for phylogenetic tree reconstruction algorithms.
  8. tools_tests: Tests for various analytical tools and algorithms.

Each directory contains multiple test files focusing on specific aspects of the respective module.

Key Files

While all test files are important, some key files that cover core functionality include:

  1. data_tests/cassiopeia_tree_test.py: Tests the CassiopeiaTree class, a core data structure.
  2. preprocess_tests/align_sequence_test.py: Tests sequence alignment functionality.
  3. solver_tests/neighborjoining_solver_test.py: Tests the widely-used Neighbor Joining algorithm.
  4. plotting_tests/local_test.py: Tests local 2D plotting functionality.
  5. tools_tests/tree_metrics_test.py: Tests various tree-based metrics and calculations.

Dependencies

The test suite relies on several external libraries and frameworks:

  1. unittest: The standard Python testing framework.
  2. networkx: For graph and tree operations.
  3. numpy and pandas: For data manipulation and analysis.
  4. scipy: For scientific computing tasks.
  5. matplotlib and plotly: For plotting backends.
  6. ete3: For phylogenetic tree manipulation.
  7. pysam: For reading and manipulating SAM/BAM files.
  8. gurobipy (optional): For ILP solving in some tests.

Configuration

Most tests use in-memory configurations or sample data created within the test methods. Some specific configurations include:

  1. iTOL plotting tests may require a configuration file (~/.itolconfig) for credentials.
  2. Some tests use random seeds for reproducibility.
  3. CCPhylo tests check for a config.ini file and a ccphylo_path setting.
  4. ILP solver tests may be skipped if Gurobi is not installed.

Overall, this comprehensive test suite ensures the reliability and accuracy of the Cassiopeia framework across its various modules and functionalities.