High-level description

This directory contains a collection of Python modules that implement various algorithms and utilities for fitness estimation and phylogenetic tree analysis. The main components include:

  1. Ancestral sequence reconstruction
  2. Fitness inference on phylogenetic trees
  3. Node ranking and coloring in trees
  4. Sequence alignment processing and tree building
  5. Survival analysis and generating function solutions
  6. Tree manipulation and visualization utilities

What does it do?

The code in this directory provides a comprehensive toolkit for analyzing evolutionary fitness and phylogenetic relationships in biological sequences. It allows researchers to:

  1. Reconstruct ancestral sequences from a phylogenetic tree and multiple sequence alignment.
  2. Infer fitness distributions for nodes in a phylogenetic tree using a message-passing algorithm.
  3. Rank and color nodes in a tree based on various metrics, including inferred fitness.
  4. Process sequence alignments, build phylogenetic trees, and annotate them with mutations.
  5. Solve generating functions for branching processes with diffusive fitness changes.
  6. Manipulate, visualize, and analyze phylogenetic trees with various utility functions.

These tools can be used to study the evolutionary dynamics of populations, predict fitness of sequences, and visualize evolutionary relationships with additional context provided by fitness estimates and other metrics.

Entry points

The main entry points for using this toolkit are:

  1. sequence_ranking.py: This module combines sequence alignment processing with node ranking and fitness inference. It provides a high-level interface for analyzing sequence data and predicting fitness.

  2. fitness_inference.py: This module implements the core algorithm for inferring fitness distributions on a phylogenetic tree. It can be used directly for fitness analysis on pre-built trees.

  3. node_ranking.py: This module extends the fitness inference functionality to provide various methods for ranking and coloring nodes in a tree. It’s useful for visualizing and interpreting the results of fitness inference.

Key Files

  1. ancestral.py: Implements maximum likelihood estimation for ancestral sequence reconstruction.

  2. fitness_inference.py: Contains the main algorithm for inferring fitness distributions on phylogenetic trees.

  3. node_ranking.py: Extends fitness inference with methods for ranking and coloring tree nodes.

  4. sequence_ranking.py: Combines sequence alignment processing with node ranking and fitness inference.

  5. solve_survival.py: Implements numerical solvers for generating functions in branching processes.

  6. tree_utils.py: Provides utility functions for building, manipulating, and visualizing phylogenetic trees.

Dependencies

The code relies on several external libraries:

  1. Biopython: Used extensively for handling biological sequences, alignments, and phylogenetic trees.
  2. NumPy: Used for numerical computations and array manipulations.
  3. SciPy: Used for various scientific computing tasks, including integration and statistical functions.
  4. Matplotlib: Used for visualization of phylogenetic trees and results.

Additionally, the code assumes the availability of external tools like fasttree for phylogenetic tree construction.

Configuration

The code does not use explicit configuration files. However, many of the classes and functions accept parameters that can be used to configure their behavior. For example:

  • In fitness_inference.py, the fitness_inference class accepts parameters like eps_branch_length, D, fit_grid, samp_frac, and mem to configure the fitness inference process.
  • In sequence_ranking.py, the alignment class accepts parameters like outgroup, cds, and collapse to configure the alignment processing and tree building.

Users can adjust these parameters when initializing the respective classes to customize the analysis for their specific needs.