Here’s a detailed explanation of the test/preprocess_tests/filter_molecule_table_test.py file:

High-level description

This file contains unit tests for the filter_molecule_table function in the Cassiopeia preprocessing pipeline. It tests various aspects of the function, including filtering based on UMI and cell barcode counts, handling of doublets, error correction of integration barcodes (intBCs), and allowing for allele conflicts.

Code Structure

The main class TestFilterMolculeTable inherits from unittest.TestCase and contains several test methods. Each test method focuses on a specific aspect of the filter_molecule_table function’s behavior.

Symbols

TestFilterMolculeTable

Description

This is the main test class that contains all the unit tests for the filter_molecule_table function.

Internal Logic

  1. Sets up test data in the setUp method.
  2. Defines various test methods to check different aspects of the filter_molecule_table function.
  3. Cleans up temporary directories in the tearDown method.

setUp

Description

Initializes test data for use in the test methods.

Internal Logic

  1. Creates a base case DataFrame (self.base_filter_case) with sample data.
  2. Creates a DataFrame for testing doublet handling (self.doublets_case).
  3. Creates a DataFrame for testing intBC error correction (self.intBC_case).
  4. Sets up a temporary directory for output files.

test_format

Description

Tests if the output DataFrame from filter_molecule_table has the expected columns.

Internal Logic

  1. Calls filter_molecule_table with the base case data.
  2. Checks if the resulting DataFrame contains all the expected columns.

test_umi_and_cellbc_filter

Description

Tests the UMI and cell barcode filtering functionality of filter_molecule_table.

Internal Logic

  1. Calls filter_molecule_table with specific filtering parameters.
  2. Checks if the resulting DataFrame contains only the expected alignments after filtering.

test_doublet_and_map

Description

Tests the doublet handling and mapping functionality of filter_molecule_table.

Internal Logic

  1. Calls filter_molecule_table with doublet-specific test data and parameters.
  2. Verifies if the resulting DataFrame contains the expected alleles after doublet handling.

test_error_correct_intBC

Description

Tests the integration barcode (intBC) error correction functionality of filter_molecule_table.

Internal Logic

  1. Calls filter_molecule_table with intBC-specific test data.
  2. Checks if the resulting DataFrame contains the expected corrected intBCs.

test_filter_allow_conflicts

Description

Tests the allow_allele_conflicts parameter of filter_molecule_table.

Internal Logic

  1. Calls filter_molecule_table with allow_allele_conflicts=True.
  2. Verifies if the resulting DataFrame retains the expected allele conflicts.

tearDown

Description

Cleans up the temporary directory created during the tests.

Internal Logic

Removes the temporary directory and its contents.

Dependencies

  • unittest: Python’s built-in unit testing framework
  • shutil: For file and directory operations
  • tempfile: For creating temporary directories
  • numpy: For numerical operations
  • pandas: For data manipulation and analysis
  • cassiopeia: The main package being tested

This test file is crucial for ensuring the correct functionality of the filter_molecule_table function, which is an important part of the Cassiopeia preprocessing pipeline. It covers various edge cases and scenarios that the function might encounter when processing real data.