High-level description

The code defines two classes, alignment and sequence_ranking, for analyzing and ranking sequences within a phylogenetic tree. The alignment class processes sequence data, calculates allele frequencies, and builds a phylogenetic tree. The sequence_ranking class extends the node_ranking class and uses the processed alignment data to predict and rank nodes based on fitness.

Code Structure

The alignment class is responsible for handling sequence data, while the sequence_ranking class utilizes this data for ranking nodes in a phylogenetic tree. The sequence_ranking class inherits from the node_ranking class, which provides functionalities for ranking nodes based on different methods.

References

  • tree_utils: This module is used for tree building and manipulation functions.
  • node_ranking: The sequence_ranking class inherits from this class, utilizing its ranking functionalities.

Symbols

alignment

Description

This class stores and processes sequence alignments, calculates allele frequencies, and builds phylogenetic trees.

Inputs

NameTypeDescription
alnBio.Align.MultipleSeqAlignmentA Biopython MultipleSeqAlignment object representing the sequence alignment.
outgroupstrThe outgroup sequence used for rooting the phylogenetic tree.
cdsdictA dictionary specifying the coding region within the alignment.
collapseboolFlag indicating whether to collapse zero-length branches in the tree.
build_treeboolFlag indicating whether to build a phylogenetic tree from the alignment.

Outputs

The class does not have explicit return values but stores processed data and the constructed tree internally.

Internal Logic

  1. Initialization: Stores input parameters, sets default values, and calls process_alignment.
  2. process_alignment: Calculates alignment summary information, consensus sequence, allele frequencies, and optionally translates the alignment if it’s a protein sequence. It then builds the phylogenetic tree using tree_utils.calculate_tree.
  3. calculate_allele_frequencies: Computes the frequency of each nucleotide/amino acid at each position in the alignment.
  4. calculate_aa_allele_frequencies: Specifically calculates amino acid frequencies for protein alignments.
  5. translate_alignment: Translates the nucleotide alignment to an amino acid alignment based on the provided coding region.
  6. mean_distance_to_sequence: Calculates the average Hamming distance between a query sequence and the alignment based on allele frequencies.
  7. mean_distance_to_set: Calculates the average Hamming distance between two alignments based on their allele frequencies.
  8. aa_distance_to_sequence: Calculates the average Hamming distance between a query amino acid sequence and the alignment.
  9. aa_distance_to_set: Calculates the average Hamming distance between two amino acid alignments.
  10. build_tree: Constructs a phylogenetic tree using the alignment and outgroup sequence, infers ancestral sequences, and optionally collapses zero-length branches.

sequence_ranking

Description

This class extends the node_ranking class to handle sequence data. It builds a phylogenetic tree from the provided alignment and uses the ranking functionalities inherited from node_ranking to predict and rank nodes based on fitness.

Inputs

NameTypeDescription
sequence_dataalignmentAn instance of the alignment class containing the processed sequence data.
distance_scalefloatA scaling factor for the branch lengths in the phylogenetic tree.
*argstupleAdditional arguments passed to the node_ranking constructor.
**kwargsdictAdditional keyword arguments passed to the node_ranking constructor.

Outputs

The class does not have explicit return values but stores the ranking results internally.

Internal Logic

  1. Initialization: Calls the node_ranking constructor, stores the input data and parameters, calculates the coalescence time scale, and sets the phylogenetic tree.
  2. predict: Initializes the fitness inference process, computes node rankings, and returns the highest-ranked external node based on the specified ranking method.

Dependencies

  • Biopython (Bio)
  • NumPy (numpy)
  • SciPy (scipy)

Error Handling

The code includes basic error handling using try-except blocks, primarily for handling potential translation errors and missing attributes. However, it does not implement specific error types or custom exception handling.