setup_utilities.py
High-level description
The setup_utilities.py
file provides utility functions for setting up and configuring the Cassiopeia preprocessing pipeline. It handles tasks such as parsing configuration files, creating output directories, and defining the sequence of preprocessing stages.
Code Structure
The setup_utilities.py
file contains three main functions: setup
, parse_config
, and create_pipeline
. The setup
function initializes the output directory and logging. The parse_config
function reads and parses the configuration file, setting default values and validating required parameters. The create_pipeline
function determines the order of preprocessing stages based on the specified entry and exit points.
References
This code references the following symbols:
cassiopeia.preprocess.constants.DEFAULT_PIPELINE_PARAMETERS
: A dictionary containing default parameters for each stage of the preprocessing pipeline.cassiopeia.mixins.UnspecifiedConfigParameterError
: An exception raised when a required configuration parameter is not specified.cassiopeia.preprocess.cassiopeia_preprocess.STAGES
: A dictionary mapping stage names to their corresponding functions in the preprocessing pipeline.
Symbols
setup
Description
Sets up the environment for the preprocessing pipeline by creating the output directory and configuring logging.
Inputs
Name | Type | Description |
---|---|---|
output_directory_location | str | The path to the output directory. |
verbose | bool | Whether to enable verbose logging. |
Side Effects
- Creates the output directory if it doesn’t exist.
- Adds a file handler to the logger, writing logs to
preprocess.log
andpreprocess.err
files in the output directory.
parse_config
Description
Parses the configuration file for the preprocessing pipeline, setting default values and validating required parameters.
Inputs
Name | Type | Description |
---|---|---|
config_string | str | The configuration file content as a string. |
Outputs
Name | Type | Description |
---|---|---|
parameters | Dict[str, Dict[str, Any]] | A dictionary containing parameters for each preprocessing stage. |
Internal Logic
- Reads the default pipeline parameters from
constants.DEFAULT_PIPELINE_PARAMETERS
. - Parses the provided
config_string
using theconfigparser
module. - Updates the default parameters with values from the parsed configuration.
- Validates that required parameters are present in the
general
section of the configuration. - Adds necessary parameters from the
general
section to other stages. - Returns the parsed and validated parameters.
Error Handling
Raises an UnspecifiedConfigParameterError
if any required parameter is missing from the configuration.
create_pipeline
Description
Creates a list of preprocessing stages to be executed based on the specified entry and exit points.
Inputs
Name | Type | Description |
---|---|---|
entry | str | The name of the stage to start the pipeline from. |
_exit | str | The name of the stage to stop the pipeline at. |
stages | Dict[str, Any] | A dictionary mapping stage names to their corresponding functions. |
Outputs
Name | Type | Description |
---|---|---|
pipeline_stages | List[str] | A list of stage names to be executed in order. |
Internal Logic
- Gets a list of stage names from the
stages
dictionary. - Finds the indices of the
entry
and_exit
stages in the list. - Returns a slice of the list from the
entry
index to the_exit
index (inclusive).