High-level description

This script generates and annotates a collection of phylogenetic trees (a “forest”) using the jungle library. It allows specifying the number of trees, leaves per tree, and a shape parameter controlling the tree structure. The resulting annotated forest is then saved to a compressed pickle file.

References

This script references the jungle library (jg) for phylogenetic tree generation and manipulation.

Symbols

generate_annotate_forest.py

Description

This script generates a collection of phylogenetic trees, annotates them with various features, and saves the resulting forest data structure to a file.

Inputs

This script takes its input from command line arguments:

NameTypeDescription
n_leavesintNumber of leaves (tips) in each generated tree.
n_treesintNumber of trees to generate for the forest.
alphafloatShape parameter influencing the tree structure (2.0 for neutral, 1.0 for positive selection).
output_dirstrPath to the directory where the output file will be saved.

Outputs

The script generates a gzipped pickle file containing the annotated forest data structure. The file name is based on the input parameters and a unique identifier.

Internal Logic

  1. Parameter Parsing: Reads input parameters from command line arguments.
  2. Output File Naming: Constructs the output file name based on input parameters and a UUID.
  3. Forest Generation: Uses the jungle library to generate a forest of n_trees trees, each with n_leaves leaves, using the specified alpha shape parameter.
  4. Forest Annotation:
    • Resolves potential multifurcations (polytomies) in the generated trees.
    • Annotates the trees with standard node features (e.g., distance to root).
    • Calculates and annotates the Colless index for each tree, a measure of tree balance.
  5. Forest Serialization: Saves the annotated forest object to a gzipped pickle file.

Side Effects

  • Creates a file in the specified output directory.
  • Prints status messages and timing information to the console if verbose is True.

Dependencies

DependencyPurpose
jungleProvides functionality for phylogenetic tree generation, manipulation, and analysis.
sysUsed for accessing command line arguments.
timeUsed for tracking the script’s execution time.
uuidUsed for generating a unique identifier for the output file.
pickleUsed for serializing and saving the forest object.
gzipUsed for compressing the output pickle file.

Configuration

The script’s behavior is controlled by command line arguments, as described in the “Inputs” section.

Logging

The script prints informative messages to the console if the verbose variable is set to True. This includes parameter values, progress updates, and timing information.