High-level description

This file defines constants and default parameters used in the Cassiopeia preprocessing pipeline. These constants include BAM tag names, quality scores, a DNA substitution matrix, and default parameters for various pipeline stages.

Code Structure

The code defines several dictionaries: BAM_CONSTANTS, SINGLE_CELL_BAM_TAGS, SPATIAL_BAM_TAGS, CHEMISTRY_BAM_TAGS, DNA_SUBSTITUTION_MATRIX, and DEFAULT_PIPELINE_PARAMETERS. The first four dictionaries define BAM tag names for different sequencing chemistries. DNA_SUBSTITUTION_MATRIX defines a substitution matrix for DNA alignment. DEFAULT_PIPELINE_PARAMETERS defines default parameters for each stage of the preprocessing pipeline.

Symbols

Symbol Name: BAM_CONSTANTS

Description:

This dictionary stores constants related to BAM file tags used in the preprocessing pipeline.

Inputs:

N/A - This is a constant dictionary.

Outputs:

N/A - This is a constant dictionary.

Internal Logic:

The dictionary maps descriptive names to their corresponding BAM tag strings. For example, RAW_CELL_BC_TAG maps to "CR", which represents the tag for the raw cell barcode sequence.

Symbol Name: SINGLE_CELL_BAM_TAGS

Description:

This dictionary defines BAM tag names for single-cell sequencing chemistries.

Inputs:

N/A - This is a constant dictionary.

Outputs:

N/A - This is a constant dictionary.

Internal Logic:

The dictionary maps data types (umi, cell_barcode) to tuples of BAM tag names. Each tuple contains two tags: one for the sequence and one for the quality scores.

Symbol Name: SPATIAL_BAM_TAGS

Description:

This dictionary defines BAM tag names for spatial sequencing chemistries.

Inputs:

N/A - This is a constant dictionary.

Outputs:

N/A - This is a constant dictionary.

Internal Logic:

Similar to SINGLE_CELL_BAM_TAGS, this dictionary maps data types (umi, spot_barcode) to tuples of BAM tag names.

Symbol Name: CHEMISTRY_BAM_TAGS

Description:

This dictionary maps specific sequencing chemistries to their corresponding BAM tag dictionaries.

Inputs:

N/A - This is a constant dictionary.

Outputs:

N/A - This is a constant dictionary.

Internal Logic:

The dictionary maps chemistry names (e.g., ‘dropseq’, ‘10xv2’) to either SINGLE_CELL_BAM_TAGS or SPATIAL_BAM_TAGS based on the chemistry type.

Symbol Name: DNA_SUBSTITUTION_MATRIX

Description:

This dictionary defines a substitution matrix for DNA sequence alignment.

Inputs:

N/A - This is a constant dictionary.

Outputs:

N/A - This is a constant dictionary.

Internal Logic:

The dictionary represents a matrix where keys are nucleotides (A, T, C, G, Z, N) and values are dictionaries mapping each nucleotide to a score. This matrix is used to score alignments between DNA sequences.

Symbol Name: DEFAULT_PIPELINE_PARAMETERS

Description:

This dictionary stores default parameters for each stage of the Cassiopeia preprocessing pipeline.

Inputs:

N/A - This is a constant dictionary.

Outputs:

N/A - This is a constant dictionary.

Internal Logic:

The dictionary maps stage names (e.g., ‘general’, ‘convert’, ‘filter_bam’) to dictionaries containing parameter names and their default values. These parameters control the behavior of each stage in the pipeline.