Overview
High-level description
This directory contains a collection of Python modules that implement various algorithms and utilities for fitness estimation and phylogenetic tree analysis. The main components include:
- Ancestral sequence reconstruction
- Fitness inference on phylogenetic trees
- Node ranking and coloring in trees
- Sequence alignment processing and tree building
- Survival analysis and generating function solutions
- Tree manipulation and visualization utilities
What does it do?
The code in this directory provides a comprehensive toolkit for analyzing evolutionary fitness and phylogenetic relationships in biological sequences. It allows researchers to:
- Reconstruct ancestral sequences from a phylogenetic tree and multiple sequence alignment.
- Infer fitness distributions for nodes in a phylogenetic tree using a message-passing algorithm.
- Rank and color nodes in a tree based on various metrics, including inferred fitness.
- Process sequence alignments, build phylogenetic trees, and annotate them with mutations.
- Solve generating functions for branching processes with diffusive fitness changes.
- Manipulate, visualize, and analyze phylogenetic trees with various utility functions.
These tools can be used to study the evolutionary dynamics of populations, predict fitness of sequences, and visualize evolutionary relationships with additional context provided by fitness estimates and other metrics.
Entry points
The main entry points for using this toolkit are:
-
sequence_ranking.py
: This module combines sequence alignment processing with node ranking and fitness inference. It provides a high-level interface for analyzing sequence data and predicting fitness. -
fitness_inference.py
: This module implements the core algorithm for inferring fitness distributions on a phylogenetic tree. It can be used directly for fitness analysis on pre-built trees. -
node_ranking.py
: This module extends the fitness inference functionality to provide various methods for ranking and coloring nodes in a tree. Itβs useful for visualizing and interpreting the results of fitness inference.
Key Files
-
ancestral.py
: Implements maximum likelihood estimation for ancestral sequence reconstruction. -
fitness_inference.py
: Contains the main algorithm for inferring fitness distributions on phylogenetic trees. -
node_ranking.py
: Extends fitness inference with methods for ranking and coloring tree nodes. -
sequence_ranking.py
: Combines sequence alignment processing with node ranking and fitness inference. -
solve_survival.py
: Implements numerical solvers for generating functions in branching processes. -
tree_utils.py
: Provides utility functions for building, manipulating, and visualizing phylogenetic trees.
Dependencies
The code relies on several external libraries:
- Biopython: Used extensively for handling biological sequences, alignments, and phylogenetic trees.
- NumPy: Used for numerical computations and array manipulations.
- SciPy: Used for various scientific computing tasks, including integration and statistical functions.
- Matplotlib: Used for visualization of phylogenetic trees and results.
Additionally, the code assumes the availability of external tools like fasttree
for phylogenetic tree construction.
Configuration
The code does not use explicit configuration files. However, many of the classes and functions accept parameters that can be used to configure their behavior. For example:
- In
fitness_inference.py
, thefitness_inference
class accepts parameters likeeps_branch_length
,D
,fit_grid
,samp_frac
, andmem
to configure the fitness inference process. - In
sequence_ranking.py
, thealignment
class accepts parameters likeoutgroup
,cds
, andcollapse
to configure the alignment processing and tree building.
Users can adjust these parameters when initializing the respective classes to customize the analysis for their specific needs.