Here’s a detailed documentation of the target file test/preprocess_tests/

High-level description

This file contains unit tests for the UMI Resolution module in the Cassiopeia preprocessing pipeline. It tests the functionality of resolving UMI sequences and filtering cells based on UMI and read count thresholds.

Code Structure

The main class TestResolveUMISequence inherits from unittest.TestCase and contains several test methods. It uses a sample collapsed UMI table to test the resolve_umi_sequence function from the pipeline module.




A test case class that contains methods to test the UMI resolution functionality.

Internal Logic

  1. Sets up a sample collapsed UMI table in the setUp method.
  2. Contains test methods to verify different aspects of UMI resolution.



Initializes the test environment by creating a sample collapsed UMI table and setting up a temporary directory.



Tests the basic functionality of the resolve_umi_sequence function.

Internal Logic

  1. Calls resolve_umi_sequence with the sample data.
  2. Checks if the correct sequence was selected for cell1-UMIA.
  3. Verifies that cell2 was filtered out.
  4. Ensures cell3 retained both UMIs.
  5. Checks the expected read counts for each cell.



Tests the filtering of cells based on the average number of reads per UMI.

Internal Logic

  1. Calls resolve_umi_sequence with a higher min_avg_reads_per_umi threshold.
  2. Verifies that only the expected cells (cell3) remain after filtering.
  3. Checks that the expected removed cells (cell1, cell2) are not in the result.



Cleans up the temporary directory after each test.


  • unittest: Python’s built-in unit testing framework
  • os: For file path operations
  • shutil: For directory removal
  • tempfile: For creating temporary directories
  • pandas: For data manipulation
  • cassiopeia.preprocess.pipeline: The module being tested


The tests use a predefined collapsed UMI table and various parameters for the resolve_umi_sequence function.

Error Handling

The tests use assertions to verify the expected outcomes. If any assertion fails, the test will raise an AssertionError.


  • The tests use a small, predefined dataset to verify the functionality of the UMI resolution process.
  • The plot parameter is set to False in test_resolve_umi and True in test_filter_by_reads, demonstrating testing with and without plot generation.
  • The temporary directory is used to store any output files generated during the tests and is cleaned up after each test.

This test file ensures that the UMI resolution functionality in the Cassiopeia preprocessing pipeline works as expected, particularly in handling different cell and UMI filtering scenarios.