sequence_ranking.py - Stanza Demo

High-level description

The code defines two classes, alignment and sequence_ranking, for analyzing and ranking sequences within a phylogenetic tree. The alignment class processes sequence data, calculates allele frequencies, and builds a phylogenetic tree. The sequence_ranking class extends the node_ranking class and uses the processed alignment data to predict and rank nodes based on fitness.

Code Structure

The alignment class is responsible for handling sequence data, while the sequence_ranking class utilizes this data for ranking nodes in a phylogenetic tree. The sequence_ranking class inherits from the node_ranking class, which provides functionalities for ranking nodes based on different methods.

References

tree_utils: This module is used for tree building and manipulation functions.
node_ranking: The sequence_ranking class inherits from this class, utilizing its ranking functionalities.

Symbols

`alignment`

Description

This class stores and processes sequence alignments, calculates allele frequencies, and builds phylogenetic trees.

Inputs

Name	Type	Description
aln	Bio.Align.MultipleSeqAlignment	A Biopython MultipleSeqAlignment object representing the sequence alignment.
outgroup	str	The outgroup sequence used for rooting the phylogenetic tree.
cds	dict	A dictionary specifying the coding region within the alignment.
collapse	bool	Flag indicating whether to collapse zero-length branches in the tree.
build_tree	bool	Flag indicating whether to build a phylogenetic tree from the alignment.

Outputs

The class does not have explicit return values but stores processed data and the constructed tree internally.

Internal Logic

Initialization: Stores input parameters, sets default values, and calls process_alignment.
process_alignment: Calculates alignment summary information, consensus sequence, allele frequencies, and optionally translates the alignment if it’s a protein sequence. It then builds the phylogenetic tree using tree_utils.calculate_tree.
calculate_allele_frequencies: Computes the frequency of each nucleotide/amino acid at each position in the alignment.
calculate_aa_allele_frequencies: Specifically calculates amino acid frequencies for protein alignments.
translate_alignment: Translates the nucleotide alignment to an amino acid alignment based on the provided coding region.
mean_distance_to_sequence: Calculates the average Hamming distance between a query sequence and the alignment based on allele frequencies.
mean_distance_to_set: Calculates the average Hamming distance between two alignments based on their allele frequencies.
aa_distance_to_sequence: Calculates the average Hamming distance between a query amino acid sequence and the alignment.
aa_distance_to_set: Calculates the average Hamming distance between two amino acid alignments.
build_tree: Constructs a phylogenetic tree using the alignment and outgroup sequence, infers ancestral sequences, and optionally collapses zero-length branches.