Overview
High-level description
This directory contains unit tests for the preprocessing module of the Cassiopeia library. The tests cover various aspects of the preprocessing pipeline, including sequence alignment, allele calling, lineage group assignment, character matrix formation, UMI collapsing, and configuration parsing.
What does it do?
The tests in this directory verify the correctness of different preprocessing steps in the Cassiopeia pipeline. These steps include:
- Aligning sequences to a reference
- Calling alleles from aligned sequences
- Assigning lineage groups to cells
- Converting allele tables to character matrices
- Collapsing UMIs (Unique Molecular Identifiers)
- Parsing configuration files for the pipeline
- Converting FASTQ files to unmapped BAM files
- Error-correcting cell barcodes and integration barcodes
- Filtering BAM files and molecule tables
- Resolving UMI sequences
Each test file focuses on a specific aspect of the preprocessing pipeline, ensuring that the functions and methods perform as expected under various input conditions and edge cases.
Entry points
The main entry points for developers working on the preprocessing module tests are:
align_sequence_test.py
: Tests for sequence alignment functionalitycall_alleles_test.py
: Tests for allele calling from aligned sequencescall_lineage_groups_test.py
: Tests for lineage group assignmentcharacter_matrix_test.py
: Tests for converting allele tables to character matricescollapse_umi_test.py
: Tests for UMI collapsing functionalityconfig_parser_test.py
: Tests for parsing configuration files
These files contain the core tests for the main preprocessing steps. Other test files focus on more specific functionalities or edge cases within the preprocessing pipeline.
Key Files
align_sequence_test.py
: Tests sequence alignment with different parameters and methods.call_alleles_test.py
: Verifies correct allele calling from CIGAR strings and aligned sequences.call_lineage_groups_test.py
: Checks lineage group assignment, including handling of doublets and reassignment.character_matrix_test.py
: Tests conversion of allele tables to character matrices and lineage profiles.collapse_umi_test.py
: Verifies UMI collapsing for different sequencing chemistries.config_parser_test.py
: Ensures correct parsing of configuration files and pipeline setup.convert_fastqs_to_unmapped_bam_test.py
: Tests conversion of FASTQ files to unmapped BAM files.error_correct_cellbcs_to_whitelist_test.py
: Checks error correction of cell barcodes against a whitelist.error_correct_intbcs_to_whitelist_test.py
: Verifies error correction of integration barcodes.error_correct_umi_test.py
: Tests UMI error correction functionality.filter_bam_test.py
: Checks filtering of BAM files based on read quality.filter_molecule_table_test.py
: Tests filtering of molecule tables based on various criteria.resolve_umi_sequence_test.py
: Verifies UMI sequence resolution and cell filtering.
Dependencies
The test files rely on the following main dependencies:
unittest
: Python’s built-in unit testing frameworknumpy
: For numerical operationspandas
: For data manipulation and analysispysam
: For reading and manipulating SAM/BAM filescassiopeia
: The main package being tested
Additional dependencies include:
os
,shutil
,tempfile
: For file and directory operationspathlib
: For handling file pathsngs_tools
: For FASTQ file handling
Configuration
Many test files use configuration parameters to set up test scenarios. These configurations are typically defined within the test methods or in the setUp
methods of test classes. Some tests also read configuration files to verify the correct parsing of pipeline settings.
The tests cover various configuration scenarios, including:
- Different sequencing chemistries (e.g., 10x Genomics v2/v3, Drop-seq, inDrops v3, Slide-seq v2)
- Various alignment parameters and methods
- Different filtering thresholds for UMIs, cell barcodes, and read counts
- Error correction settings for cell barcodes, integration barcodes, and UMIs
By testing these different configurations, the test suite ensures that the Cassiopeia preprocessing pipeline can handle a wide range of input data and user-defined settings.