High-level description
The code defines two classes,alignment and sequence_ranking, for analyzing and ranking sequences within a phylogenetic tree. The alignment class processes sequence data, calculates allele frequencies, and builds a phylogenetic tree. The sequence_ranking class extends the node_ranking class and uses the processed alignment data to predict and rank nodes based on fitness.
Code Structure
Thealignment class is responsible for handling sequence data, while the sequence_ranking class utilizes this data for ranking nodes in a phylogenetic tree. The sequence_ranking class inherits from the node_ranking class, which provides functionalities for ranking nodes based on different methods.
References
tree_utils: This module is used for tree building and manipulation functions.node_ranking: Thesequence_rankingclass inherits from this class, utilizing its ranking functionalities.
Symbols
alignment
Description
This class stores and processes sequence alignments, calculates allele frequencies, and builds phylogenetic trees.Inputs
| Name | Type | Description |
|---|---|---|
| aln | Bio.Align.MultipleSeqAlignment | A Biopython MultipleSeqAlignment object representing the sequence alignment. |
| outgroup | str | The outgroup sequence used for rooting the phylogenetic tree. |
| cds | dict | A dictionary specifying the coding region within the alignment. |
| collapse | bool | Flag indicating whether to collapse zero-length branches in the tree. |
| build_tree | bool | Flag indicating whether to build a phylogenetic tree from the alignment. |
Outputs
The class does not have explicit return values but stores processed data and the constructed tree internally.Internal Logic
- Initialization: Stores input parameters, sets default values, and calls
process_alignment. process_alignment: Calculates alignment summary information, consensus sequence, allele frequencies, and optionally translates the alignment if it’s a protein sequence. It then builds the phylogenetic tree usingtree_utils.calculate_tree.calculate_allele_frequencies: Computes the frequency of each nucleotide/amino acid at each position in the alignment.calculate_aa_allele_frequencies: Specifically calculates amino acid frequencies for protein alignments.translate_alignment: Translates the nucleotide alignment to an amino acid alignment based on the provided coding region.mean_distance_to_sequence: Calculates the average Hamming distance between a query sequence and the alignment based on allele frequencies.mean_distance_to_set: Calculates the average Hamming distance between two alignments based on their allele frequencies.aa_distance_to_sequence: Calculates the average Hamming distance between a query amino acid sequence and the alignment.aa_distance_to_set: Calculates the average Hamming distance between two amino acid alignments.build_tree: Constructs a phylogenetic tree using the alignment and outgroup sequence, infers ancestral sequences, and optionally collapses zero-length branches.
sequence_ranking
Description
This class extends thenode_ranking class to handle sequence data. It builds a phylogenetic tree from the provided alignment and uses the ranking functionalities inherited from node_ranking to predict and rank nodes based on fitness.
Inputs
| Name | Type | Description |
|---|---|---|
| sequence_data | alignment | An instance of the alignment class containing the processed sequence data. |
| distance_scale | float | A scaling factor for the branch lengths in the phylogenetic tree. |
| *args | tuple | Additional arguments passed to the node_ranking constructor. |
| **kwargs | dict | Additional keyword arguments passed to the node_ranking constructor. |
Outputs
The class does not have explicit return values but stores the ranking results internally.Internal Logic
- Initialization: Calls the
node_rankingconstructor, stores the input data and parameters, calculates the coalescence time scale, and sets the phylogenetic tree. predict: Initializes the fitness inference process, computes node rankings, and returns the highest-ranked external node based on the specified ranking method.
Dependencies
- Biopython (Bio)
- NumPy (numpy)
- SciPy (scipy)
Error Handling
The code includes basic error handling usingtry-except blocks, primarily for handling potential translation errors and missing attributes. However, it does not implement specific error types or custom exception handling.