> ## Documentation Index
> Fetch the complete documentation index at: https://demo.agenticlabs.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

## High-level description

The `cassiopeia/tools/fitness_estimator` directory contains implementations of fitness estimation algorithms for phylogenetic trees, specifically designed for use with the Cassiopeia library. The main components include an abstract base class `FitnessEstimator` and a concrete implementation `LBIJungle` that uses the Lineage Branching Index (LBI) method.

## What does it do?

This module provides tools for estimating the fitness of nodes in phylogenetic trees. The main functionality includes:

1. Defining an interface for fitness estimation algorithms through the abstract `FitnessEstimator` class.
2. Implementing the LBI fitness estimator using the `jungle` package, which calculates fitness based on the branching patterns of the tree.
3. Converting Cassiopeia trees to Newick format for compatibility with external libraries.
4. Annotating nodes in the tree with fitness values.

The fitness estimation process helps in understanding the evolutionary dynamics of the sequences represented in the phylogenetic tree, with higher fitness values indicating potentially more successful or rapidly evolving lineages.

## Entry points

The main entry points for using this module are:

1. `FitnessEstimator`: An abstract base class that defines the interface for all fitness estimation algorithms in Cassiopeia.
2. `LBIJungle`: A concrete implementation of the `FitnessEstimator` class that uses the Lineage Branching Index method for fitness estimation.

Developers can use these classes to estimate fitness for `CassiopeiaTree` objects, which represent phylogenetic trees in the Cassiopeia library.

## Key Files

1. `_FitnessEstimator.py`: Defines the abstract `FitnessEstimator` class and the `FitnessEstimatorError` exception.
2. `_lbi_jungle.py`: Implements the `LBIJungle` class, which uses the `jungle` package to estimate fitness using the LBI method.
3. `__init__.py`: Serves as the top-level entry point for the module, exposing the main components.

## Dependencies

The module relies on several external libraries:

1. `jungle`: A wrapper around Neher et al.'s original code for LBI calculations.
2. `networkx`: Used for representing and manipulating tree topologies.
3. `numpy`: Used for random number generation and array manipulation.
4. `ete3`: For phylogenetic tree manipulation and visualization (used in the `_jungle` subdirectory).
5. `Bio.Phylo`: For interfacing with Biopython's phylogenetic tree representation (used in the `_jungle` subdirectory).
6. `scipy`: For various scientific computing tasks and statistical functions (used in the `_jungle` subdirectory).
7. `pandas`: For data manipulation and analysis (used in the `_jungle` subdirectory).
8. `matplotlib`: For visualization of results and trees (used in the `_jungle` subdirectory).

## Configuration

The main classes use constructor parameters and method arguments for configuration:

* `LBIJungle`:
  * `random_seed`: Optional integer to set the random seed for reproducibility.

* `estimate_fitness` method:
  * Takes a `CassiopeiaTree` object as input and modifies it in place by adding a 'fitness' attribute to each node.

Users can adjust these parameters to customize the fitness estimation process for their specific needs in evolutionary studies and population genetics research.

The `_jungle` subdirectory contains additional classes and functions for more advanced phylogenetic analysis, including:

* `Forest`: For managing collections of phylogenetic trees.
* `Tree`: For analyzing individual phylogenetic trees.
* `SFS`: For calculating and analyzing Site Frequency Spectra.
* `SizeMatchedModel`: For statistical modeling based on data size.

These components provide a comprehensive toolkit for in-depth analysis of evolutionary fitness and phylogenetic relationships in biological sequences.
