Overview
High-level description
The cassiopeia/tools/fitness_estimator
directory contains implementations of fitness estimation algorithms for phylogenetic trees, specifically designed for use with the Cassiopeia library. The main components include an abstract base class FitnessEstimator
and a concrete implementation LBIJungle
that uses the Lineage Branching Index (LBI) method.
What does it do?
This module provides tools for estimating the fitness of nodes in phylogenetic trees. The main functionality includes:
- Defining an interface for fitness estimation algorithms through the abstract
FitnessEstimator
class. - Implementing the LBI fitness estimator using the
jungle
package, which calculates fitness based on the branching patterns of the tree. - Converting Cassiopeia trees to Newick format for compatibility with external libraries.
- Annotating nodes in the tree with fitness values.
The fitness estimation process helps in understanding the evolutionary dynamics of the sequences represented in the phylogenetic tree, with higher fitness values indicating potentially more successful or rapidly evolving lineages.
Entry points
The main entry points for using this module are:
FitnessEstimator
: An abstract base class that defines the interface for all fitness estimation algorithms in Cassiopeia.LBIJungle
: A concrete implementation of theFitnessEstimator
class that uses the Lineage Branching Index method for fitness estimation.
Developers can use these classes to estimate fitness for CassiopeiaTree
objects, which represent phylogenetic trees in the Cassiopeia library.
Key Files
_FitnessEstimator.py
: Defines the abstractFitnessEstimator
class and theFitnessEstimatorError
exception._lbi_jungle.py
: Implements theLBIJungle
class, which uses thejungle
package to estimate fitness using the LBI method.__init__.py
: Serves as the top-level entry point for the module, exposing the main components.
Dependencies
The module relies on several external libraries:
jungle
: A wrapper around Neher et al.’s original code for LBI calculations.networkx
: Used for representing and manipulating tree topologies.numpy
: Used for random number generation and array manipulation.ete3
: For phylogenetic tree manipulation and visualization (used in the_jungle
subdirectory).Bio.Phylo
: For interfacing with Biopython’s phylogenetic tree representation (used in the_jungle
subdirectory).scipy
: For various scientific computing tasks and statistical functions (used in the_jungle
subdirectory).pandas
: For data manipulation and analysis (used in the_jungle
subdirectory).matplotlib
: For visualization of results and trees (used in the_jungle
subdirectory).
Configuration
The main classes use constructor parameters and method arguments for configuration:
-
LBIJungle
:random_seed
: Optional integer to set the random seed for reproducibility.
-
estimate_fitness
method:- Takes a
CassiopeiaTree
object as input and modifies it in place by adding a ‘fitness’ attribute to each node.
- Takes a
Users can adjust these parameters to customize the fitness estimation process for their specific needs in evolutionary studies and population genetics research.
The _jungle
subdirectory contains additional classes and functions for more advanced phylogenetic analysis, including:
Forest
: For managing collections of phylogenetic trees.Tree
: For analyzing individual phylogenetic trees.SFS
: For calculating and analyzing Site Frequency Spectra.SizeMatchedModel
: For statistical modeling based on data size.
These components provide a comprehensive toolkit for in-depth analysis of evolutionary fitness and phylogenetic relationships in biological sequences.