High-level description

This directory contains the implementation of a beta coalescent tree simulator and a Site Frequency Spectrum (SFS) calculator. The code is designed to model and analyze the evolutionary relationships and genetic diversity in populations using coalescent theory and beta coalescent processes.

What does it do?

The code in this directory simulates the evolutionary history of genetic sequences under different coalescent models, particularly the beta coalescent model. It generates genealogical trees that represent the ancestral relationships between individuals in a population sample. Using these trees, it calculates the Site Frequency Spectrum (SFS), which is a summary statistic of genetic variation in the population. The SFS represents the distribution of allele frequencies in the sample, providing insights into the population’s genetic diversity and evolutionary history.

Key Files

  1. betatree.py: This file contains the betatree class, which is responsible for simulating beta coalescent trees. It implements methods for tree initialization, coalescence events, and tree structure manipulation.

  2. sfs.py and sfs_py3.py: These files define the SFS class, which extends the functionality of betatree to calculate the Site Frequency Spectrum. The SFS class generates multiple trees, accumulates allele frequency information, and computes the SFS. It also provides methods for binning the SFS and saving/loading SFS data.

Dependencies

The code relies on several external libraries:

  • NumPy: Used for numerical computations and array operations.
  • SciPy: Specifically, scipy.special is used for special mathematical functions like the gamma function.
  • BioPython: The Bio.Phylo module is used for working with phylogenetic trees.
  • Matplotlib: Used for plotting the SFS (in example usage).

Configuration

The code does not use explicit configuration files. Instead, key parameters are passed as arguments to the class constructors:

  • sample_size: The number of individuals in the sample.
  • alpha: The alpha parameter of the beta coalescent model (default is 2, which corresponds to the Kingman coalescent).

These parameters can be adjusted when initializing the betatree or SFS objects to simulate different evolutionary scenarios.

The SFS class also allows for configuration of the SFS calculation and binning process through method parameters:

  • ntrees: The number of trees to generate for SFS calculation.
  • mode: The binning mode for the SFS (linear, log, or logit).
  • bins: The number of bins or custom bin edges for SFS binning.

These configurations allow users to fine-tune the SFS calculation and analysis based on their specific research needs.