Here’s a high-level description and documentation of the provided code:

High-level description

This file contains unit tests for the character matrix formation functionality in the Cassiopeia preprocessing pipeline. It tests various aspects of converting allele tables to character matrices and lineage profiles, including handling of missing data, conflicts, and different input formats.

Code Structure

The code defines a TestCharacterMatrixFormation class that inherits from unittest.TestCase. This class contains multiple test methods, each focusing on a specific aspect of the character matrix formation process. The setUp method initializes test data used across multiple test cases.

Symbols

TestCharacterMatrixFormation

Description

A test class containing multiple unit tests for the character matrix formation functionality.

Internal Logic

  1. Sets up test data in the setUp method.
  2. Defines multiple test methods, each testing a specific aspect of the character matrix formation process.
  3. Uses assertions to verify the correctness of the output.

setUp

Description

Initializes test data used across multiple test cases.

Internal Logic

  1. Creates sample allele tables with various configurations.
  2. Sets up mutation priors for testing.

test_basic_character_matrix_formation

Description

Tests the basic functionality of converting an allele table to a character matrix.

Internal Logic

  1. Calls convert_alleletable_to_character_matrix with basic input.
  2. Verifies the shape and content of the resulting character matrix.

test_character_matrix_formation_custom_missing_data

Description

Tests character matrix formation with custom missing data handling.

Internal Logic

  1. Modifies the input data to include custom missing data.
  2. Calls convert_alleletable_to_character_matrix with custom missing data parameters.
  3. Verifies the correct handling of missing data in the output.

test_character_matrix_formation_with_conflicts

Description

Tests character matrix formation when there are conflicting alleles.

Internal Logic

  1. Uses an allele table with conflicting data.
  2. Calls convert_alleletable_to_character_matrix.
  3. Verifies that conflicts are correctly represented in the output.

test_ignore_intbc

Description

Tests the ability to ignore specific integration barcodes (intBCs) during character matrix formation.

Internal Logic

  1. Calls convert_alleletable_to_character_matrix with specific intBCs to ignore.
  2. Verifies that the specified intBCs are not included in the output.

test_filter_out_low_diversity_intbcs

Description

Tests filtering out integration barcodes with low diversity.

Internal Logic

  1. Calls convert_alleletable_to_character_matrix with a high allele representation threshold.
  2. Verifies that low-diversity intBCs are removed from the output.

test_mutation_prior_formation

Description

Tests the formation of mutation priors during character matrix conversion.

Internal Logic

  1. Calls convert_alleletable_to_character_matrix with mutation priors.
  2. Verifies that the resulting prior probabilities are correct.

test_indel_state_mapping_formation

Description

Tests the formation of indel state mappings during character matrix conversion.

Internal Logic

  1. Calls convert_alleletable_to_character_matrix with mutation priors.
  2. Verifies that the resulting indel state mappings are correct.

test_alleletable_to_lineage_profile

Description

Tests the conversion of an allele table to a lineage profile.

Internal Logic

  1. Calls convert_alleletable_to_lineage_profile.
  2. Verifies that the resulting lineage profile is correct.

test_lineage_profile_to_character_matrix_no_priors

Description

Tests the conversion of a lineage profile to a character matrix without priors.

Internal Logic

  1. Converts an allele table to a lineage profile.
  2. Calls convert_lineage_profile_to_character_matrix without priors.
  3. Verifies the correctness of the resulting character matrix.

test_lineage_profile_to_character_matrix_with_priors

Description

Tests the conversion of a lineage profile to a character matrix with priors.

Internal Logic

  1. Converts an allele table to a lineage profile.
  2. Calls convert_lineage_profile_to_character_matrix with priors.
  3. Verifies the correctness of the resulting character matrix and priors.

test_compute_empirical_indel_probabilities

Description

Tests the computation of empirical indel probabilities.

Internal Logic

  1. Calls compute_empirical_indel_priors.
  2. Verifies that the resulting probabilities are correct.

test_noncanonical_cut_sites_allele_table_to_character_matrix

Description

Tests character matrix formation with non-canonical cut site names.

Internal Logic

  1. Uses an allele table with non-standard cut site column names.
  2. Calls convert_alleletable_to_character_matrix with custom cut site names.
  3. Verifies that the resulting character matrix is correct.

Dependencies

  • unittest: For defining and running unit tests
  • numpy: For numerical operations
  • pandas: For data manipulation and analysis
  • cassiopeia: The main package being tested

This test suite provides comprehensive coverage of the character matrix formation functionality in Cassiopeia, ensuring that it handles various edge cases and input formats correctly.