Here’s a detailed documentation of the target file:

High-level description

This file contains unit tests for the error_correct_cellbcs_to_whitelist function in the cassiopeia.preprocess.pipeline module. The tests verify the functionality of correcting raw cell barcodes to a whitelist for different sequencing chemistries (10x Genomics v3 and Slide-seq v2).

Code Structure

The main class TestErrorCorrectCellBCsToWhitelist contains setup methods and test cases. It uses the unittest framework and relies on the pysam library for BAM file operations.

Symbols

TestErrorCorrectCellBCsToWhitelist

Description

A test class that inherits from unittest.TestCase. It contains methods to set up test data and run tests for the error_correct_cellbcs_to_whitelist function.

Internal Logic

  1. Sets up test data in the setUp method.
  2. Defines test methods for different scenarios:
    • 10x v3 chemistry with file-based whitelist
    • 10x v3 chemistry with list-based whitelist
    • Slide-seq v2 chemistry with file-based whitelist
    • Slide-seq v2 chemistry with list-based whitelist

setUp

Description

Initializes test data and file paths for the test cases.

Internal Logic

  1. Determines the directory path of the test file.
  2. Sets up file paths for test BAM files and whitelists.
  3. Defines whitelist sequences for 10x v3 and Slide-seq v2 chemistries.

test_10xv3

Description

Tests the error_correct_cellbcs_to_whitelist function for 10x v3 chemistry using a file-based whitelist.

Internal Logic

  1. Calls error_correct_cellbcs_to_whitelist with 10x v3 BAM file and whitelist file.
  2. Verifies the number of alignments and their corrected cell barcodes.

test_10xv3_whitelist_list

Description

Tests the error_correct_cellbcs_to_whitelist function for 10x v3 chemistry using a list-based whitelist.

Internal Logic

  1. Calls error_correct_cellbcs_to_whitelist with 10x v3 BAM file and whitelist list.
  2. Verifies the number of alignments and their corrected cell barcodes.

test_slideseq2

Description

Tests the error_correct_cellbcs_to_whitelist function for Slide-seq v2 chemistry using a file-based whitelist.

Internal Logic

  1. Calls error_correct_cellbcs_to_whitelist with Slide-seq v2 BAM file and whitelist file.
  2. Verifies the number of alignments, presence of CB tags, and the corrected cell barcode.

test_slideseq2_whitelist_list

Description

Tests the error_correct_cellbcs_to_whitelist function for Slide-seq v2 chemistry using a list-based whitelist.

Internal Logic

  1. Calls error_correct_cellbcs_to_whitelist with Slide-seq v2 BAM file and whitelist list.
  2. Verifies the number of alignments, presence of CB tags, and the corrected cell barcode.

Dependencies

  • unittest: Python’s built-in unit testing framework
  • os: For file and directory operations
  • tempfile: For creating temporary directories
  • pysam: For BAM file operations
  • ngs_tools: A custom library for NGS data processing
  • cassiopeia.preprocess.pipeline: The module containing the function being tested

Error Handling

The test cases use assertions to verify the expected outcomes. If any assertion fails, an AssertionError will be raised, indicating a test failure.