vanillagreedy_test.py
High-level description
This file contains unit tests for the VanillaGreedySolver class, which is part of the Cassiopeia package for phylogenetic tree reconstruction. The tests cover various aspects of the solver, including frequency dictionary computation, missing data handling, ambiguous state handling, and tree reconstruction with different input scenarios.
Code Structure
The main class VanillaGreedySolverTest
inherits from unittest.TestCase
and contains multiple test methods. Each test method focuses on a specific aspect of the VanillaGreedySolver’s functionality, such as frequency dictionary computation, missing data handling, and tree reconstruction. The tests use sample character matrices and expected tree structures to verify the solver’s behavior.
Symbols
VanillaGreedySolverTest
Description
A test class that contains unit tests for the VanillaGreedySolver class.
Internal Logic
The class contains multiple test methods, each focusing on a specific aspect of the VanillaGreedySolver’s functionality:
- Frequency dictionary computation
- Missing data handling
- Ambiguous state handling
- Tree reconstruction with various input scenarios
find_triplet_structure
Description
A helper function that determines the structure of a triplet in a given tree.
Inputs
Name | Type | Description |
---|---|---|
triplet | tuple | A tuple containing three node names |
T | networkx.DiGraph | The tree to analyze |
Outputs
Name | Type | Description |
---|---|---|
structure | str | The structure of the triplet (”-”, “ab”, “ac”, or “bc”) |
Internal Logic
- Find ancestors for each node in the triplet
- Calculate the number of common ancestors for each pair of nodes
- Determine the structure based on the number of common ancestors
test_basic_freq_dict
Description
Tests the computation of mutation frequencies for a basic character matrix.
Internal Logic
- Create a sample character matrix
- Initialize a VanillaGreedySolver
- Compute mutation frequencies
- Assert the correctness of the frequency dictionary
test_duplicate_freq_dict
Description
Tests the computation of mutation frequencies for a character matrix with duplicate rows.
Internal Logic
Similar to test_basic_freq_dict, but with a character matrix containing duplicate rows.
test_ambiguous_freq_dict
Description
Tests the computation of mutation frequencies for a character matrix with ambiguous states.
Internal Logic
Similar to test_basic_freq_dict, but with a character matrix containing ambiguous states (tuples).
test_ambiguous_duplicate_freq_dict
Description
Tests the computation of mutation frequencies for a character matrix with ambiguous states and duplicate rows.
Internal Logic
Similar to test_ambiguous_freq_dict, but with duplicate rows.
test_average_missing_data
Description
Tests the average missing data imputation method.
Internal Logic
- Create a sample character matrix with missing data
- Apply the average missing data imputation method
- Assert the correctness of the resulting partitions
test_average_missing_data_priors
Description
Tests the average missing data imputation method with priors.
Internal Logic
Similar to test_average_missing_data, but includes priors for weighting.
test_all_duplicates_base_case
Description
Tests the solver’s behavior when all samples are identical.
Internal Logic
- Create a character matrix with identical rows
- Solve the tree
- Verify the resulting tree structure
test_case_1, test_case_2
Description
Tests the solver’s behavior with more complex character matrices.
Internal Logic
- Create a complex character matrix
- Solve the tree
- Verify the resulting tree structure using triplet comparisons
test_weighted_case_trivial
Description
Tests the solver’s behavior with a weighted case (using priors).
Internal Logic
Similar to test_case_2, but includes priors for weighting.
test_priors_case
Description
Tests the solver’s behavior with more complex priors.
Internal Logic
Similar to test_weighted_case_trivial, but with more complex priors.
test_ambiguous_no_missing, test_ambiguous_with_missing, test_ambiguous_with_missing_and_duplicates
Description
Tests the solver’s behavior with ambiguous states, missing data, and duplicates.
Internal Logic
- Create character matrices with ambiguous states, missing data, and/or duplicates
- Solve the tree
- Verify the resulting tree structure using triplet comparisons
Dependencies
Dependency | Purpose |
---|---|
unittest | Provides the testing framework |
itertools | Used for generating combinations |
networkx | Used for tree representation and analysis |
pandas | Used for character matrix representation |
cassiopeia | The main package being tested |
Error Handling
The test cases use assertions to verify the correctness of the solver’s output. If any assertion fails, the test case will raise an AssertionError.
Your response should not exceed 3000 words or 4000 tokens. Focus on providing clear, concise information that can be directly inferred from the code. Include optional sections only when they provide significant value for understanding the code.