sequence_ranking.py
High-level description
The code defines two classes, alignment
and sequence_ranking
, for analyzing and ranking sequences within a phylogenetic tree. The alignment
class processes sequence data, calculates allele frequencies, and builds a phylogenetic tree. The sequence_ranking
class extends the node_ranking
class and uses the processed alignment data to predict and rank nodes based on fitness.
Code Structure
The alignment
class is responsible for handling sequence data, while the sequence_ranking
class utilizes this data for ranking nodes in a phylogenetic tree. The sequence_ranking
class inherits from the node_ranking
class, which provides functionalities for ranking nodes based on different methods.
References
tree_utils
: This module is used for tree building and manipulation functions.node_ranking
: Thesequence_ranking
class inherits from this class, utilizing its ranking functionalities.
Symbols
alignment
Description
This class stores and processes sequence alignments, calculates allele frequencies, and builds phylogenetic trees.
Inputs
Name | Type | Description |
---|---|---|
aln | Bio.Align.MultipleSeqAlignment | A Biopython MultipleSeqAlignment object representing the sequence alignment. |
outgroup | str | The outgroup sequence used for rooting the phylogenetic tree. |
cds | dict | A dictionary specifying the coding region within the alignment. |
collapse | bool | Flag indicating whether to collapse zero-length branches in the tree. |
build_tree | bool | Flag indicating whether to build a phylogenetic tree from the alignment. |
Outputs
The class does not have explicit return values but stores processed data and the constructed tree internally.
Internal Logic
- Initialization: Stores input parameters, sets default values, and calls
process_alignment
. process_alignment
: Calculates alignment summary information, consensus sequence, allele frequencies, and optionally translates the alignment if it’s a protein sequence. It then builds the phylogenetic tree usingtree_utils.calculate_tree
.calculate_allele_frequencies
: Computes the frequency of each nucleotide/amino acid at each position in the alignment.calculate_aa_allele_frequencies
: Specifically calculates amino acid frequencies for protein alignments.translate_alignment
: Translates the nucleotide alignment to an amino acid alignment based on the provided coding region.mean_distance_to_sequence
: Calculates the average Hamming distance between a query sequence and the alignment based on allele frequencies.mean_distance_to_set
: Calculates the average Hamming distance between two alignments based on their allele frequencies.aa_distance_to_sequence
: Calculates the average Hamming distance between a query amino acid sequence and the alignment.aa_distance_to_set
: Calculates the average Hamming distance between two amino acid alignments.build_tree
: Constructs a phylogenetic tree using the alignment and outgroup sequence, infers ancestral sequences, and optionally collapses zero-length branches.
sequence_ranking
Description
This class extends the node_ranking
class to handle sequence data. It builds a phylogenetic tree from the provided alignment and uses the ranking functionalities inherited from node_ranking
to predict and rank nodes based on fitness.
Inputs
Name | Type | Description |
---|---|---|
sequence_data | alignment | An instance of the alignment class containing the processed sequence data. |
distance_scale | float | A scaling factor for the branch lengths in the phylogenetic tree. |
*args | tuple | Additional arguments passed to the node_ranking constructor. |
**kwargs | dict | Additional keyword arguments passed to the node_ranking constructor. |
Outputs
The class does not have explicit return values but stores the ranking results internally.
Internal Logic
- Initialization: Calls the
node_ranking
constructor, stores the input data and parameters, calculates the coalescence time scale, and sets the phylogenetic tree. predict
: Initializes the fitness inference process, computes node rankings, and returns the highest-ranked external node based on the specified ranking method.
Dependencies
- Biopython (Bio)
- NumPy (numpy)
- SciPy (scipy)
Error Handling
The code includes basic error handling using try-except
blocks, primarily for handling potential translation errors and missing attributes. However, it does not implement specific error types or custom exception handling.