High-level description
This code defines a classSFS that simulates the evolution of genetic sequences under a beta coalescent model and calculates the Site Frequency Spectrum (SFS) from the generated genealogical trees. The SFS represents the distribution of allele frequencies in a population sample.
Code Structure
TheSFS class inherits from the betatree class, which is responsible for generating the beta coalescent trees. The SFS class extends this functionality by accumulating allele frequency information from multiple trees and calculating the SFS. It also provides methods for binning and saving the calculated SFS.
References
This code references thebetatree class from the same package, specifically the coalesce method for generating the trees and accessing tree nodes for allele frequency calculations. It also uses the Bio.Phylo module for working with phylogenetic trees.
Symbols
Symbol Name: SFS
Description:
This class simulates the evolution of genetic sequences under a beta coalescent model and calculates the Site Frequency Spectrum (SFS).Inputs:
| Name | Type | Description | 
|---|---|---|
| sample_size | int | The number of individuals in the sample. | 
| alpha | float | The alpha parameter of the beta coalescent model (default: 2). | 
Outputs:
The class does not have a direct return value. It stores the calculated SFS and related data in its internal attributes.Internal Logic:
- Initialization: Initializes the SFSobject by calling the constructor of the parent classbetatreeand initializing internal variables to store allele frequencies and the SFS.
- Tree Generation and SFS Calculation: The glob_treesmethod generates multiple beta coalescent trees using the inheritedcoalescemethod. For each tree, it extracts the branch lengths and corresponding allele counts (weights) and stores them. ThegetSFSmethod then uses this accumulated data to calculate the average SFS over all generated trees.
- SFS Binning: The binSFSmethod allows for binning the calculated SFS using different modes (linear, log, logit) and binning schemes. It calculates the bin edges, bin widths, and the binned SFS values.
- SFS Input/Output: The saveSFSmethod saves a calculated SFS to a file, while theloadSFSmethod loads a previously saved SFS from a file.
Side Effects:
Theglob_trees and getSFS methods modify the internal state of the SFS object by generating trees, accumulating allele frequencies, and calculating the SFS.
Symbol Name: logit
Description:
This function calculates the logit of a given value.Inputs:
| Name | Type | Description | 
|---|---|---|
| x | float | The input value. | 
Outputs:
| Name | Type | Description | 
|---|---|---|
| logit(x) | float | The logit of the input value. | 
Internal Logic:
The function calculates the logit using the formula:log(x / (1 - x)).
Dependencies
| Dependency | Purpose | 
|---|---|
| numpy | Numerical operations, array handling. | 
| scipy.special | Special functions, including the gamma function used in beta coalescent calculations. | 
| Bio.Phylo | Working with phylogenetic trees. | 
| matplotlib.pyplot | Plotting the SFS (only in the example usage). | 
Error Handling
The code includes basic error handling when loading an SFS from a file. It checks if the file exists and if the loaded data has the expected one-dimensional shape. If an error occurs, it prints an error message but does not raise an exception.Logging
The code does not implement any specific logging mechanisms. It usesprint statements for informational output.
API/Interface Reference
This code does not define an explicit API. It provides a classSFS with methods for simulating the evolutionary process, calculating the SFS, and interacting with SFS data.
