generate_annotate_forest.py
High-level description
This script generates and annotates a collection of phylogenetic trees (a “forest”) using the jungle
library. It allows specifying the number of trees, leaves per tree, and a shape parameter controlling the tree structure. The resulting annotated forest is then saved to a compressed pickle file.
References
This script references the jungle
library (jg
) for phylogenetic tree generation and manipulation.
Symbols
generate_annotate_forest.py
Description
This script generates a collection of phylogenetic trees, annotates them with various features, and saves the resulting forest data structure to a file.
Inputs
This script takes its input from command line arguments:
Name | Type | Description |
---|---|---|
n_leaves | int | Number of leaves (tips) in each generated tree. |
n_trees | int | Number of trees to generate for the forest. |
alpha | float | Shape parameter influencing the tree structure (2.0 for neutral, 1.0 for positive selection). |
output_dir | str | Path to the directory where the output file will be saved. |
Outputs
The script generates a gzipped pickle file containing the annotated forest data structure. The file name is based on the input parameters and a unique identifier.
Internal Logic
- Parameter Parsing: Reads input parameters from command line arguments.
- Output File Naming: Constructs the output file name based on input parameters and a UUID.
- Forest Generation: Uses the
jungle
library to generate a forest ofn_trees
trees, each withn_leaves
leaves, using the specifiedalpha
shape parameter. - Forest Annotation:
- Resolves potential multifurcations (polytomies) in the generated trees.
- Annotates the trees with standard node features (e.g., distance to root).
- Calculates and annotates the Colless index for each tree, a measure of tree balance.
- Forest Serialization: Saves the annotated forest object to a gzipped pickle file.
Side Effects
- Creates a file in the specified output directory.
- Prints status messages and timing information to the console if
verbose
is True.
Dependencies
Dependency | Purpose |
---|---|
jungle | Provides functionality for phylogenetic tree generation, manipulation, and analysis. |
sys | Used for accessing command line arguments. |
time | Used for tracking the script’s execution time. |
uuid | Used for generating a unique identifier for the output file. |
pickle | Used for serializing and saving the forest object. |
gzip | Used for compressing the output pickle file. |
Configuration
The script’s behavior is controlled by command line arguments, as described in the “Inputs” section.
Logging
The script prints informative messages to the console if the verbose
variable is set to True
. This includes parameter values, progress updates, and timing information.