High-level description

The setup_utilities.py file provides utility functions for setting up and configuring the Cassiopeia preprocessing pipeline. It handles tasks such as parsing configuration files, creating output directories, and defining the sequence of preprocessing stages.

Code Structure

The setup_utilities.py file contains three main functions: setup, parse_config, and create_pipeline. The setup function initializes the output directory and logging. The parse_config function reads and parses the configuration file, setting default values and validating required parameters. The create_pipeline function determines the order of preprocessing stages based on the specified entry and exit points.

References

This code references the following symbols:

  • cassiopeia.preprocess.constants.DEFAULT_PIPELINE_PARAMETERS: A dictionary containing default parameters for each stage of the preprocessing pipeline.
  • cassiopeia.mixins.UnspecifiedConfigParameterError: An exception raised when a required configuration parameter is not specified.
  • cassiopeia.preprocess.cassiopeia_preprocess.STAGES: A dictionary mapping stage names to their corresponding functions in the preprocessing pipeline.

Symbols

setup

Description

Sets up the environment for the preprocessing pipeline by creating the output directory and configuring logging.

Inputs

NameTypeDescription
output_directory_locationstrThe path to the output directory.
verboseboolWhether to enable verbose logging.

Side Effects

  • Creates the output directory if it doesn’t exist.
  • Adds a file handler to the logger, writing logs to preprocess.log and preprocess.err files in the output directory.

parse_config

Description

Parses the configuration file for the preprocessing pipeline, setting default values and validating required parameters.

Inputs

NameTypeDescription
config_stringstrThe configuration file content as a string.

Outputs

NameTypeDescription
parametersDict[str, Dict[str, Any]]A dictionary containing parameters for each preprocessing stage.

Internal Logic

  1. Reads the default pipeline parameters from constants.DEFAULT_PIPELINE_PARAMETERS.
  2. Parses the provided config_string using the configparser module.
  3. Updates the default parameters with values from the parsed configuration.
  4. Validates that required parameters are present in the general section of the configuration.
  5. Adds necessary parameters from the general section to other stages.
  6. Returns the parsed and validated parameters.

Error Handling

Raises an UnspecifiedConfigParameterError if any required parameter is missing from the configuration.

create_pipeline

Description

Creates a list of preprocessing stages to be executed based on the specified entry and exit points.

Inputs

NameTypeDescription
entrystrThe name of the stage to start the pipeline from.
_exitstrThe name of the stage to stop the pipeline at.
stagesDict[str, Any]A dictionary mapping stage names to their corresponding functions.

Outputs

NameTypeDescription
pipeline_stagesList[str]A list of stage names to be executed in order.

Internal Logic

  1. Gets a list of stage names from the stages dictionary.
  2. Finds the indices of the entry and _exit stages in the list.
  3. Returns a slice of the list from the entry index to the _exit index (inclusive).