align_sequence_test.py
Here’s a detailed explanation of the align_sequence_test.py
file:
High-level description
This file contains unit tests for the sequence alignment functionality in the Cassiopeia preprocessing pipeline. It tests various aspects of the align_sequences
function, including the structure of the output, the behavior with different alignment parameters, and the correctness of alignments for specific input sequences.
Code Structure
The main class TestAlignSequence
inherits from unittest.TestCase
and contains several test methods. Each method tests a different aspect of the sequence alignment functionality.
Symbols
TestAlignSequence
Description
This class contains all the unit tests for the sequence alignment functionality.
Internal Logic
- Sets up test data in the
setUp
method. - Defines several test methods, each focusing on a specific aspect of the alignment function.
setUp
Description
Initializes the test data used across all test methods.
Internal Logic
- Creates a DataFrame
self.queries
with sample sequence data. - Sets a reference sequence
self.reference
.
test_alignment_dataframe_structure
Description
Tests the structure of the output DataFrame from the align_sequences
function.
Internal Logic
- Calls
align_sequences
with test data. - Checks if the output DataFrame has the correct number of rows and expected columns.
- Verifies that all cell barcodes from the input are present in the output.
test_extremely_large_gap_open_penalty
Description
Tests the alignment behavior when using an extremely large gap open penalty.
Internal Logic
- Calls
align_sequences
with a very high gap open penalty (255). - Checks that no gaps (insertions or deletions) are present in the resulting alignments.
test_default_alignment_works
Description
Tests the correctness of alignments using default parameters.
Internal Logic
- Calls
align_sequences
with default parameters. - Compares the resulting CIGAR strings and alignment scores with expected values for each input sequence.
test_global_alignment
Description
Tests the global alignment mode of the align_sequences
function.
Internal Logic
- Calls
align_sequences
with themethod
parameter set to “global”. - Compares the resulting CIGAR strings and alignment scores with expected values for global alignment.
Dependencies
The test file depends on the following modules:
unittest
: For creating and running unit tests.numpy
: For numerical operations.pandas
: For handling DataFrames.cassiopeia
: The main package being tested.
Error Handling
The tests use assertions to check for expected outcomes. If any assertion fails, the test will raise an AssertionError.
Performance Considerations
These tests are not specifically designed to test performance, but rather the correctness of the alignment function. However, the use of different parameters (like gap penalties and alignment methods) can affect the performance of the alignment algorithm.
In conclusion, this test file provides comprehensive coverage of the sequence alignment functionality in the Cassiopeia preprocessing pipeline, ensuring that the alignment function works correctly under various conditions and with different input parameters.