This file defines constants and default parameters used in the Cassiopeia preprocessing pipeline. These constants include BAM tag names, quality scores, a DNA substitution matrix, and default parameters for various pipeline stages.
The code defines several dictionaries: BAM_CONSTANTS
, SINGLE_CELL_BAM_TAGS
, SPATIAL_BAM_TAGS
, CHEMISTRY_BAM_TAGS
, DNA_SUBSTITUTION_MATRIX
, and DEFAULT_PIPELINE_PARAMETERS
. The first four dictionaries define BAM tag names for different sequencing chemistries. DNA_SUBSTITUTION_MATRIX
defines a substitution matrix for DNA alignment. DEFAULT_PIPELINE_PARAMETERS
defines default parameters for each stage of the preprocessing pipeline.
BAM_CONSTANTS
This dictionary stores constants related to BAM file tags used in the preprocessing pipeline.
N/A - This is a constant dictionary.
N/A - This is a constant dictionary.
The dictionary maps descriptive names to their corresponding BAM tag strings. For example, RAW_CELL_BC_TAG
maps to "CR"
, which represents the tag for the raw cell barcode sequence.
SINGLE_CELL_BAM_TAGS
This dictionary defines BAM tag names for single-cell sequencing chemistries.
N/A - This is a constant dictionary.
N/A - This is a constant dictionary.
The dictionary maps data types (umi
, cell_barcode
) to tuples of BAM tag names. Each tuple contains two tags: one for the sequence and one for the quality scores.
SPATIAL_BAM_TAGS
This dictionary defines BAM tag names for spatial sequencing chemistries.
N/A - This is a constant dictionary.
N/A - This is a constant dictionary.
Similar to SINGLE_CELL_BAM_TAGS
, this dictionary maps data types (umi
, spot_barcode
) to tuples of BAM tag names.
CHEMISTRY_BAM_TAGS
This dictionary maps specific sequencing chemistries to their corresponding BAM tag dictionaries.
N/A - This is a constant dictionary.
N/A - This is a constant dictionary.
The dictionary maps chemistry names (e.g., ‘dropseq’, ‘10xv2’) to either SINGLE_CELL_BAM_TAGS
or SPATIAL_BAM_TAGS
based on the chemistry type.
DNA_SUBSTITUTION_MATRIX
This dictionary defines a substitution matrix for DNA sequence alignment.
N/A - This is a constant dictionary.
N/A - This is a constant dictionary.
The dictionary represents a matrix where keys are nucleotides (A, T, C, G, Z, N) and values are dictionaries mapping each nucleotide to a score. This matrix is used to score alignments between DNA sequences.
DEFAULT_PIPELINE_PARAMETERS
This dictionary stores default parameters for each stage of the Cassiopeia preprocessing pipeline.
N/A - This is a constant dictionary.
N/A - This is a constant dictionary.
The dictionary maps stage names (e.g., ‘general’, ‘convert’, ‘filter_bam’) to dictionaries containing parameter names and their default values. These parameters control the behavior of each stage in the pipeline.
This file defines constants and default parameters used in the Cassiopeia preprocessing pipeline. These constants include BAM tag names, quality scores, a DNA substitution matrix, and default parameters for various pipeline stages.
The code defines several dictionaries: BAM_CONSTANTS
, SINGLE_CELL_BAM_TAGS
, SPATIAL_BAM_TAGS
, CHEMISTRY_BAM_TAGS
, DNA_SUBSTITUTION_MATRIX
, and DEFAULT_PIPELINE_PARAMETERS
. The first four dictionaries define BAM tag names for different sequencing chemistries. DNA_SUBSTITUTION_MATRIX
defines a substitution matrix for DNA alignment. DEFAULT_PIPELINE_PARAMETERS
defines default parameters for each stage of the preprocessing pipeline.
BAM_CONSTANTS
This dictionary stores constants related to BAM file tags used in the preprocessing pipeline.
N/A - This is a constant dictionary.
N/A - This is a constant dictionary.
The dictionary maps descriptive names to their corresponding BAM tag strings. For example, RAW_CELL_BC_TAG
maps to "CR"
, which represents the tag for the raw cell barcode sequence.
SINGLE_CELL_BAM_TAGS
This dictionary defines BAM tag names for single-cell sequencing chemistries.
N/A - This is a constant dictionary.
N/A - This is a constant dictionary.
The dictionary maps data types (umi
, cell_barcode
) to tuples of BAM tag names. Each tuple contains two tags: one for the sequence and one for the quality scores.
SPATIAL_BAM_TAGS
This dictionary defines BAM tag names for spatial sequencing chemistries.
N/A - This is a constant dictionary.
N/A - This is a constant dictionary.
Similar to SINGLE_CELL_BAM_TAGS
, this dictionary maps data types (umi
, spot_barcode
) to tuples of BAM tag names.
CHEMISTRY_BAM_TAGS
This dictionary maps specific sequencing chemistries to their corresponding BAM tag dictionaries.
N/A - This is a constant dictionary.
N/A - This is a constant dictionary.
The dictionary maps chemistry names (e.g., ‘dropseq’, ‘10xv2’) to either SINGLE_CELL_BAM_TAGS
or SPATIAL_BAM_TAGS
based on the chemistry type.
DNA_SUBSTITUTION_MATRIX
This dictionary defines a substitution matrix for DNA sequence alignment.
N/A - This is a constant dictionary.
N/A - This is a constant dictionary.
The dictionary represents a matrix where keys are nucleotides (A, T, C, G, Z, N) and values are dictionaries mapping each nucleotide to a score. This matrix is used to score alignments between DNA sequences.
DEFAULT_PIPELINE_PARAMETERS
This dictionary stores default parameters for each stage of the Cassiopeia preprocessing pipeline.
N/A - This is a constant dictionary.
N/A - This is a constant dictionary.
The dictionary maps stage names (e.g., ‘general’, ‘convert’, ‘filter_bam’) to dictionaries containing parameter names and their default values. These parameters control the behavior of each stage in the pipeline.