Overview - Stanza Demo

High-level description

This directory contains unit tests for the preprocessing module of the Cassiopeia library. The tests cover various aspects of the preprocessing pipeline, including sequence alignment, allele calling, lineage group assignment, character matrix formation, UMI collapsing, and configuration parsing.

What does it do?

The tests in this directory verify the correctness of different preprocessing steps in the Cassiopeia pipeline. These steps include:

Aligning sequences to a reference
Calling alleles from aligned sequences
Assigning lineage groups to cells
Converting allele tables to character matrices
Collapsing UMIs (Unique Molecular Identifiers)
Parsing configuration files for the pipeline
Converting FASTQ files to unmapped BAM files
Error-correcting cell barcodes and integration barcodes
Filtering BAM files and molecule tables
Resolving UMI sequences

Each test file focuses on a specific aspect of the preprocessing pipeline, ensuring that the functions and methods perform as expected under various input conditions and edge cases.

Entry points

The main entry points for developers working on the preprocessing module tests are:

align_sequence_test.py: Tests for sequence alignment functionality
call_alleles_test.py: Tests for allele calling from aligned sequences
call_lineage_groups_test.py: Tests for lineage group assignment
character_matrix_test.py: Tests for converting allele tables to character matrices
collapse_umi_test.py: Tests for UMI collapsing functionality
config_parser_test.py: Tests for parsing configuration files

These files contain the core tests for the main preprocessing steps. Other test files focus on more specific functionalities or edge cases within the preprocessing pipeline.

Key Files

align_sequence_test.py: Tests sequence alignment with different parameters and methods.
call_alleles_test.py: Verifies correct allele calling from CIGAR strings and aligned sequences.
call_lineage_groups_test.py: Checks lineage group assignment, including handling of doublets and reassignment.
character_matrix_test.py: Tests conversion of allele tables to character matrices and lineage profiles.
collapse_umi_test.py: Verifies UMI collapsing for different sequencing chemistries.
config_parser_test.py: Ensures correct parsing of configuration files and pipeline setup.
convert_fastqs_to_unmapped_bam_test.py: Tests conversion of FASTQ files to unmapped BAM files.
error_correct_cellbcs_to_whitelist_test.py: Checks error correction of cell barcodes against a whitelist.
error_correct_intbcs_to_whitelist_test.py: Verifies error correction of integration barcodes.
error_correct_umi_test.py: Tests UMI error correction functionality.
filter_bam_test.py: Checks filtering of BAM files based on read quality.
filter_molecule_table_test.py: Tests filtering of molecule tables based on various criteria.
resolve_umi_sequence_test.py: Verifies UMI sequence resolution and cell filtering.

Dependencies

The test files rely on the following main dependencies:

unittest: Python’s built-in unit testing framework
numpy: For numerical operations
pandas: For data manipulation and analysis
pysam: For reading and manipulating SAM/BAM files
cassiopeia: The main package being tested

Additional dependencies include:

os, shutil, tempfile: For file and directory operations
pathlib: For handling file paths
ngs_tools: For FASTQ file handling

Configuration

Many test files use configuration parameters to set up test scenarios. These configurations are typically defined within the test methods or in the setUp methods of test classes. Some tests also read configuration files to verify the correct parsing of pipeline settings. The tests cover various configuration scenarios, including:

Different sequencing chemistries (e.g., 10x Genomics v2/v3, Drop-seq, inDrops v3, Slide-seq v2)
Various alignment parameters and methods
Different filtering thresholds for UMIs, cell barcodes, and read counts
Error correction settings for cell barcodes, integration barcodes, and UMIs

By testing these different configurations, the test suite ensures that the Cassiopeia preprocessing pipeline can handle a wide range of input data and user-defined settings.

Project Root

​High-level description

​What does it do?

​Entry points

​Key Files

​Dependencies