sfs.py
High-level description
This code defines a class SFS
that simulates the evolution of genetic sequences under a beta coalescent model and calculates the Site Frequency Spectrum (SFS) from the generated genealogical trees. The SFS represents the distribution of allele frequencies in a population sample.
Code Structure
The SFS
class inherits from the betatree
class, which is responsible for generating the beta coalescent trees. The SFS
class extends this functionality by accumulating allele frequency information from multiple trees and calculating the SFS. It also provides methods for binning and saving the calculated SFS.
References
This code references the betatree
class from the same package, specifically the coalesce
method for generating the trees and accessing tree nodes for allele frequency calculations. It also uses the Bio.Phylo
module for working with phylogenetic trees.
Symbols
Symbol Name: SFS
Description:
This class simulates the evolution of genetic sequences under a beta coalescent model and calculates the Site Frequency Spectrum (SFS).
Inputs:
Name | Type | Description |
---|---|---|
sample_size | int | The number of individuals in the sample. |
alpha | float | The alpha parameter of the beta coalescent model (default: 2). |
Outputs:
The class does not have a direct return value. It stores the calculated SFS and related data in its internal attributes.
Internal Logic:
- Initialization: Initializes the
SFS
object by calling the constructor of the parent classbetatree
and initializing internal variables to store allele frequencies and the SFS. - Tree Generation and SFS Calculation: The
glob_trees
method generates multiple beta coalescent trees using the inheritedcoalesce
method. For each tree, it extracts the branch lengths and corresponding allele counts (weights) and stores them. ThegetSFS
method then uses this accumulated data to calculate the average SFS over all generated trees. - SFS Binning: The
binSFS
method allows for binning the calculated SFS using different modes (linear, log, logit) and binning schemes. It calculates the bin edges, bin widths, and the binned SFS values. - SFS Input/Output: The
saveSFS
method saves a calculated SFS to a file, while theloadSFS
method loads a previously saved SFS from a file.
Side Effects:
The glob_trees
and getSFS
methods modify the internal state of the SFS
object by generating trees, accumulating allele frequencies, and calculating the SFS.
Symbol Name: logit
Description:
This function calculates the logit of a given value.
Inputs:
Name | Type | Description |
---|---|---|
x | float | The input value. |
Outputs:
Name | Type | Description |
---|---|---|
logit(x) | float | The logit of the input value. |
Internal Logic:
The function calculates the logit using the formula: log(x / (1 - x))
.
Dependencies
Dependency | Purpose |
---|---|
numpy | Numerical operations, array handling. |
scipy.special | Special functions, including the gamma function used in beta coalescent calculations. |
Bio.Phylo | Working with phylogenetic trees. |
matplotlib.pyplot | Plotting the SFS (only in the example usage). |
Error Handling
The code includes basic error handling when loading an SFS from a file. It checks if the file exists and if the loaded data has the expected one-dimensional shape. If an error occurs, it prints an error message but does not raise an exception.
Logging
The code does not implement any specific logging mechanisms. It uses print
statements for informational output.
API/Interface Reference
This code does not define an explicit API. It provides a class SFS
with methods for simulating the evolutionary process, calculating the SFS, and interacting with SFS data.
TODOs
There are no explicit TODOs or notes left in the code.