character_matrix_test.py
Here’s a high-level description and documentation of the provided code:
High-level description
This file contains unit tests for the character matrix formation functionality in the Cassiopeia preprocessing pipeline. It tests various aspects of converting allele tables to character matrices and lineage profiles, including handling of missing data, conflicts, and different input formats.
Code Structure
The code defines a TestCharacterMatrixFormation class that inherits from unittest.TestCase. This class contains multiple test methods, each focusing on a specific aspect of the character matrix formation process. The setUp method initializes test data used across multiple test cases.
Symbols
TestCharacterMatrixFormation
Description
A test class containing multiple unit tests for the character matrix formation functionality.
Internal Logic
- Sets up test data in the setUp method.
- Defines multiple test methods, each testing a specific aspect of the character matrix formation process.
- Uses assertions to verify the correctness of the output.
setUp
Description
Initializes test data used across multiple test cases.
Internal Logic
- Creates sample allele tables with various configurations.
- Sets up mutation priors for testing.
test_basic_character_matrix_formation
Description
Tests the basic functionality of converting an allele table to a character matrix.
Internal Logic
- Calls convert_alleletable_to_character_matrix with basic input.
- Verifies the shape and content of the resulting character matrix.
test_character_matrix_formation_custom_missing_data
Description
Tests character matrix formation with custom missing data handling.
Internal Logic
- Modifies the input data to include custom missing data.
- Calls convert_alleletable_to_character_matrix with custom missing data parameters.
- Verifies the correct handling of missing data in the output.
test_character_matrix_formation_with_conflicts
Description
Tests character matrix formation when there are conflicting alleles.
Internal Logic
- Uses an allele table with conflicting data.
- Calls convert_alleletable_to_character_matrix.
- Verifies that conflicts are correctly represented in the output.
test_ignore_intbc
Description
Tests the ability to ignore specific integration barcodes (intBCs) during character matrix formation.
Internal Logic
- Calls convert_alleletable_to_character_matrix with specific intBCs to ignore.
- Verifies that the specified intBCs are not included in the output.
test_filter_out_low_diversity_intbcs
Description
Tests filtering out integration barcodes with low diversity.
Internal Logic
- Calls convert_alleletable_to_character_matrix with a high allele representation threshold.
- Verifies that low-diversity intBCs are removed from the output.
test_mutation_prior_formation
Description
Tests the formation of mutation priors during character matrix conversion.
Internal Logic
- Calls convert_alleletable_to_character_matrix with mutation priors.
- Verifies that the resulting prior probabilities are correct.
test_indel_state_mapping_formation
Description
Tests the formation of indel state mappings during character matrix conversion.
Internal Logic
- Calls convert_alleletable_to_character_matrix with mutation priors.
- Verifies that the resulting indel state mappings are correct.
test_alleletable_to_lineage_profile
Description
Tests the conversion of an allele table to a lineage profile.
Internal Logic
- Calls convert_alleletable_to_lineage_profile.
- Verifies that the resulting lineage profile is correct.
test_lineage_profile_to_character_matrix_no_priors
Description
Tests the conversion of a lineage profile to a character matrix without priors.
Internal Logic
- Converts an allele table to a lineage profile.
- Calls convert_lineage_profile_to_character_matrix without priors.
- Verifies the correctness of the resulting character matrix.
test_lineage_profile_to_character_matrix_with_priors
Description
Tests the conversion of a lineage profile to a character matrix with priors.
Internal Logic
- Converts an allele table to a lineage profile.
- Calls convert_lineage_profile_to_character_matrix with priors.
- Verifies the correctness of the resulting character matrix and priors.
test_compute_empirical_indel_probabilities
Description
Tests the computation of empirical indel probabilities.
Internal Logic
- Calls compute_empirical_indel_priors.
- Verifies that the resulting probabilities are correct.
test_noncanonical_cut_sites_allele_table_to_character_matrix
Description
Tests character matrix formation with non-canonical cut site names.
Internal Logic
- Uses an allele table with non-standard cut site column names.
- Calls convert_alleletable_to_character_matrix with custom cut site names.
- Verifies that the resulting character matrix is correct.
Dependencies
- unittest: For defining and running unit tests
- numpy: For numerical operations
- pandas: For data manipulation and analysis
- cassiopeia: The main package being tested
This test suite provides comprehensive coverage of the character matrix formation functionality in Cassiopeia, ensuring that it handles various edge cases and input formats correctly.