upgma_test.py
Here’s a detailed documentation of the test/solver_tests/upgma_test.py
file:
High-level description
This file contains unit tests for the UPGMASolver class in Cassiopeia’s solver module. It tests various aspects of the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) algorithm implementation, including basic functionality, handling of lineage tracing data, and dealing with duplicates and missing data.
Code Structure
The main class TestUPGMASolver
inherits from unittest.TestCase
and contains multiple test methods. The setUp
method initializes various test scenarios, and each test method focuses on a specific aspect of the UPGMASolver.
Symbols
TestUPGMASolver
Description
A test class for the UPGMASolver, containing multiple test methods to verify the correctness of the UPGMA algorithm implementation.
Internal Logic
The class sets up various test scenarios in the setUp
method and then tests different aspects of the UPGMASolver in separate test methods.
setUp
Description
Initializes test data and solver instances for various scenarios.
Internal Logic
- Sets up a basic character matrix and dissimilarity map
- Creates a basic CassiopeiaTree
- Initializes UPGMASolver instances
- Sets up additional test scenarios for lineage tracing and handling duplicates/missing data
test_constructor
Description
Tests the initialization of the UPGMASolver.
Internal Logic
Checks that the dissimilarity function and dissimilarity map are properly set.
test_find_cherry
Description
Tests the find_cherry
method of UPGMASolver.
Internal Logic
- Calls
find_cherry
on the basic dissimilarity map - Verifies that the returned cherry is either (“a”, “b”) or (“b”, “a”)
test_update_dissimilarity_map
Description
Tests the update_dissimilarity_map
method of UPGMASolver.
Internal Logic
- Finds a cherry in the basic dissimilarity map
- Updates the dissimilarity map with the found cherry
- Verifies the updated dissimilarity map against expected values
- Repeats the process for a second update
test_basic_solver
Description
Tests the basic functionality of the UPGMASolver.
Internal Logic
- Solves the basic tree using UPGMASolver
- Checks that all leaves exist in the tree
- Verifies the number of edges in the tree
- Compares the tree structure with an expected tree structure
- Compares tree distances between leaves
test_upgma_solver_weights
Description
Tests the UPGMASolver with weighted Hamming distance and prior probabilities.
Internal Logic
- Solves the tree with priors using UPGMASolver
- Verifies the dissimilarity between specific nodes
- Compares the resulting tree structure with an expected structure
- Repeats the test with collapsing mutationless edges
test_pp_solver
Description
Tests the UPGMASolver with a lineage tracing scenario.
Internal Logic
- Solves the lineage tracing tree using UPGMASolver
- Verifies the dissimilarity between specific nodes
- Compares the resulting tree structure with an expected structure
test_duplicate
Description
Tests the UPGMASolver with a scenario containing duplicates and missing data.
Internal Logic
- Solves the tree with duplicates and missing data using UPGMASolver
- Verifies the dissimilarity between specific nodes
- Compares the resulting tree structure with an expected structure
Dependencies
- unittest
- typing
- itertools
- networkx
- numpy
- pandas
- cassiopeia.data.CassiopeiaTree
- cassiopeia.solver.UPGMASolver
- cassiopeia.solver.dissimilarity_functions
Helper Functions
find_triplet_structure
Description
Determines the structure of a triplet in a given tree.
Inputs
Name | Type | Description |
---|---|---|
triplet | tuple | A tuple of three node names |
T | networkx.DiGraph | The tree to analyze |
Outputs
Name | Type | Description |
---|---|---|
structure | str | The structure of the triplet (“ab”, “ac”, “bc”, or ”-“) |
Internal Logic
- Finds the ancestors of each node in the triplet
- Compares the number of common ancestors between pairs
- Returns the structure based on which pair has the most common ancestors
This file provides a comprehensive test suite for the UPGMASolver, covering various scenarios and edge cases to ensure the correctness of the UPGMA algorithm implementation in Cassiopeia.