filter_molecule_table_test.py
Here’s a detailed explanation of the test/preprocess_tests/filter_molecule_table_test.py
file:
High-level description
This file contains unit tests for the filter_molecule_table
function in the Cassiopeia preprocessing pipeline. It tests various aspects of the function, including filtering based on UMI and cell barcode counts, handling of doublets, error correction of integration barcodes (intBCs), and allowing for allele conflicts.
Code Structure
The main class TestFilterMolculeTable
inherits from unittest.TestCase
and contains several test methods. Each test method focuses on a specific aspect of the filter_molecule_table
function’s behavior.
Symbols
TestFilterMolculeTable
Description
This is the main test class that contains all the unit tests for the filter_molecule_table
function.
Internal Logic
- Sets up test data in the
setUp
method. - Defines various test methods to check different aspects of the
filter_molecule_table
function. - Cleans up temporary directories in the
tearDown
method.
setUp
Description
Initializes test data for use in the test methods.
Internal Logic
- Creates a base case DataFrame (
self.base_filter_case
) with sample data. - Creates a DataFrame for testing doublet handling (
self.doublets_case
). - Creates a DataFrame for testing intBC error correction (
self.intBC_case
). - Sets up a temporary directory for output files.
test_format
Description
Tests if the output DataFrame from filter_molecule_table
has the expected columns.
Internal Logic
- Calls
filter_molecule_table
with the base case data. - Checks if the resulting DataFrame contains all the expected columns.
test_umi_and_cellbc_filter
Description
Tests the UMI and cell barcode filtering functionality of filter_molecule_table
.
Internal Logic
- Calls
filter_molecule_table
with specific filtering parameters. - Checks if the resulting DataFrame contains only the expected alignments after filtering.
test_doublet_and_map
Description
Tests the doublet handling and mapping functionality of filter_molecule_table
.
Internal Logic
- Calls
filter_molecule_table
with doublet-specific test data and parameters. - Verifies if the resulting DataFrame contains the expected alleles after doublet handling.
test_error_correct_intBC
Description
Tests the integration barcode (intBC) error correction functionality of filter_molecule_table
.
Internal Logic
- Calls
filter_molecule_table
with intBC-specific test data. - Checks if the resulting DataFrame contains the expected corrected intBCs.
test_filter_allow_conflicts
Description
Tests the allow_allele_conflicts
parameter of filter_molecule_table
.
Internal Logic
- Calls
filter_molecule_table
withallow_allele_conflicts=True
. - Verifies if the resulting DataFrame retains the expected allele conflicts.
tearDown
Description
Cleans up the temporary directory created during the tests.
Internal Logic
Removes the temporary directory and its contents.
Dependencies
- unittest: Python’s built-in unit testing framework
- shutil: For file and directory operations
- tempfile: For creating temporary directories
- numpy: For numerical operations
- pandas: For data manipulation and analysis
- cassiopeia: The main package being tested
This test file is crucial for ensuring the correct functionality of the filter_molecule_table
function, which is an important part of the Cassiopeia preprocessing pipeline. It covers various edge cases and scenarios that the function might encounter when processing real data.