High-level description

This code defines a class SFS that simulates the evolution of genetic sequences under a beta coalescent model and calculates the Site Frequency Spectrum (SFS) from the generated genealogical trees. The SFS represents the distribution of allele frequencies in a population sample.

Code Structure

The SFS class inherits from the betatree class, which is responsible for generating the beta coalescent trees. The SFS class extends this functionality by accumulating allele frequency information from multiple trees and calculating the SFS. It also provides methods for binning and saving the calculated SFS.

References

This code references the betatree class from the same package, specifically the coalesce method for generating the trees and accessing tree nodes for allele frequency calculations. It also uses the Bio.Phylo module for working with phylogenetic trees.

Symbols

Symbol Name: SFS

Description:

This class simulates the evolution of genetic sequences under a beta coalescent model and calculates the Site Frequency Spectrum (SFS).

Inputs:

NameTypeDescription
sample_sizeintThe number of individuals in the sample.
alphafloatThe alpha parameter of the beta coalescent model (default: 2).

Outputs:

The class does not have a direct return value. It stores the calculated SFS and related data in its internal attributes.

Internal Logic:

  1. Initialization: Initializes the SFS object by calling the constructor of the parent class betatree and initializing internal variables to store allele frequencies and the SFS.
  2. Tree Generation and SFS Calculation: The glob_trees method generates multiple beta coalescent trees using the inherited coalesce method. For each tree, it extracts the branch lengths and corresponding allele counts (weights) and stores them. The getSFS method then uses this accumulated data to calculate the average SFS over all generated trees.
  3. SFS Binning: The binSFS method allows for binning the calculated SFS using different modes (linear, log, logit) and binning schemes. It calculates the bin edges, bin widths, and the binned SFS values.
  4. SFS Input/Output: The saveSFS method saves a calculated SFS to a file, while the loadSFS method loads a previously saved SFS from a file.

Side Effects:

The glob_trees and getSFS methods modify the internal state of the SFS object by generating trees, accumulating allele frequencies, and calculating the SFS.

Symbol Name: logit

Description:

This function calculates the logit of a given value.

Inputs:

NameTypeDescription
xfloatThe input value.

Outputs:

NameTypeDescription
logit(x)floatThe logit of the input value.

Internal Logic:

The function calculates the logit using the formula: log(x / (1 - x)).

Dependencies

DependencyPurpose
numpyNumerical operations, array handling.
scipy.specialSpecial functions, including the gamma function used in beta coalescent calculations.
Bio.PhyloWorking with phylogenetic trees.
matplotlib.pyplotPlotting the SFS (only in the example usage).

Error Handling

The code includes basic error handling when loading an SFS from a file. It checks if the file exists and if the loaded data has the expected one-dimensional shape. If an error occurs, it prints an error message but does not raise an exception.

Logging

The code does not implement any specific logging mechanisms. It uses print statements for informational output.

API/Interface Reference

This code does not define an explicit API. It provides a class SFS with methods for simulating the evolutionary process, calculating the SFS, and interacting with SFS data.

TODOs

There are no explicit TODOs or notes left in the code.