constants.py
High-level description
This file defines constants and default parameters used in the Cassiopeia preprocessing pipeline. These constants include BAM tag names, quality scores, a DNA substitution matrix, and default parameters for various pipeline stages.
Code Structure
The code defines several dictionaries: BAM_CONSTANTS
, SINGLE_CELL_BAM_TAGS
, SPATIAL_BAM_TAGS
, CHEMISTRY_BAM_TAGS
, DNA_SUBSTITUTION_MATRIX
, and DEFAULT_PIPELINE_PARAMETERS
. The first four dictionaries define BAM tag names for different sequencing chemistries. DNA_SUBSTITUTION_MATRIX
defines a substitution matrix for DNA alignment. DEFAULT_PIPELINE_PARAMETERS
defines default parameters for each stage of the preprocessing pipeline.
Symbols
Symbol Name: BAM_CONSTANTS
Description:
This dictionary stores constants related to BAM file tags used in the preprocessing pipeline.
Inputs:
N/A - This is a constant dictionary.
Outputs:
N/A - This is a constant dictionary.
Internal Logic:
The dictionary maps descriptive names to their corresponding BAM tag strings. For example, RAW_CELL_BC_TAG
maps to "CR"
, which represents the tag for the raw cell barcode sequence.
Symbol Name: SINGLE_CELL_BAM_TAGS
Description:
This dictionary defines BAM tag names for single-cell sequencing chemistries.
Inputs:
N/A - This is a constant dictionary.
Outputs:
N/A - This is a constant dictionary.
Internal Logic:
The dictionary maps data types (umi
, cell_barcode
) to tuples of BAM tag names. Each tuple contains two tags: one for the sequence and one for the quality scores.
Symbol Name: SPATIAL_BAM_TAGS
Description:
This dictionary defines BAM tag names for spatial sequencing chemistries.
Inputs:
N/A - This is a constant dictionary.
Outputs:
N/A - This is a constant dictionary.
Internal Logic:
Similar to SINGLE_CELL_BAM_TAGS
, this dictionary maps data types (umi
, spot_barcode
) to tuples of BAM tag names.
Symbol Name: CHEMISTRY_BAM_TAGS
Description:
This dictionary maps specific sequencing chemistries to their corresponding BAM tag dictionaries.
Inputs:
N/A - This is a constant dictionary.
Outputs:
N/A - This is a constant dictionary.
Internal Logic:
The dictionary maps chemistry names (e.g., ‘dropseq’, ‘10xv2’) to either SINGLE_CELL_BAM_TAGS
or SPATIAL_BAM_TAGS
based on the chemistry type.
Symbol Name: DNA_SUBSTITUTION_MATRIX
Description:
This dictionary defines a substitution matrix for DNA sequence alignment.
Inputs:
N/A - This is a constant dictionary.
Outputs:
N/A - This is a constant dictionary.
Internal Logic:
The dictionary represents a matrix where keys are nucleotides (A, T, C, G, Z, N) and values are dictionaries mapping each nucleotide to a score. This matrix is used to score alignments between DNA sequences.
Symbol Name: DEFAULT_PIPELINE_PARAMETERS
Description:
This dictionary stores default parameters for each stage of the Cassiopeia preprocessing pipeline.
Inputs:
N/A - This is a constant dictionary.
Outputs:
N/A - This is a constant dictionary.
Internal Logic:
The dictionary maps stage names (e.g., ‘general’, ‘convert’, ‘filter_bam’) to dictionaries containing parameter names and their default values. These parameters control the behavior of each stage in the pipeline.