High-level description

The tree_utils.py file provides a collection of functions for building, manipulating, annotating, and visualizing phylogenetic trees. These functions are primarily used for analyzing and understanding the evolutionary relationships between sequences, particularly in the context of fitness prediction.

Code Structure

This file defines a set of independent utility functions that operate on phylogenetic trees represented using the Biopython library’s Phylo objects. Some functions utilize external tools like fasttree for tree building.

References

This code references the Biopython library (Bio) extensively for handling sequences, alignments, and phylogenetic trees. It also uses the matplotlib library for visualization purposes.

Symbols

calculate_tree

Description

Builds a phylogenetic tree from a given sequence alignment and outgroup sequence using the fasttree program. It infers ancestral sequences and performs basic tree manipulation like rooting and ladderizing.

Inputs

NameTypeDescription
alnBio.Align.MultipleSeqAlignmentA Biopython alignment object containing the sequences.
outgroupBio.SeqRecord.SeqRecordA Biopython SeqRecord object representing the outgroup sequence.
ancestralboolFlag indicating whether to infer ancestral sequences (default: True).

Outputs

NameTypeDescription
biopython_treeBio.Phylo.BaseTree.TreeA rooted and ladderized Biopython tree object constructed from the input alignment.

Internal Logic

  1. Writes the alignment and outgroup sequence to a temporary FASTA file.
  2. Executes fasttree with the temporary file as input.
  3. Parses the fasttree output into a Biopython Phylo.Tree object.
  4. Roots the tree using the provided outgroup.
  5. Ladderizes the tree for visual clarity.
  6. If ancestral is True, infers ancestral sequences for internal nodes.
  7. Removes the temporary file.

branch_label

Description

Generates a label for a tree branch based on the mutations occurring along that branch.

Inputs

NameTypeDescription
nodeBio.Phylo.Clade.CladeThe tree node representing the end of the branch to be labeled.
aaboolFlag indicating whether to use amino acid or nucleotide sequences for labeling (default: True).
display_positionslistOptional list of positions to consider for labeling. If provided, only mutations at these positions will be included in the label.

Outputs

NameTypeDescription
labelstrA string representing the mutations along the branch, formatted as a series of underscore-separated mutation descriptions (e.g., “A123T_G456C”).

Internal Logic

  1. Determines whether to use amino acid (node.aa_mutations) or nucleotide (node.mutations) mutations based on the aa flag.
  2. If display_positions is provided, filters the mutations to include only those occurring at the specified positions.
  3. Formats the selected mutations into a string representation.

collapse_zero_branches

Description

Collapses branches in the tree where the sequences at the parent and child nodes are identical. This simplifies the tree by merging nodes with no evolutionary changes.

Inputs

NameTypeDescription
cladeBio.Phylo.Clade.CladeThe root node of the subtree to process.

Internal Logic

  1. Recursively traverses the tree.
  2. For each internal node, compares its sequence to its children’s sequences.
  3. If a child node has the same sequence as its parent, collapses the child into the parent.

annotate_leaf

Description

Adds annotations to a leaf node in the tree.

Inputs

NameTypeDescription
leafBio.Phylo.Clade.CladeThe leaf node to annotate.
annotationdictA dictionary where keys represent annotation attributes and values are the corresponding values to be assigned to the leaf node.

Internal Logic

Iterates through the annotation dictionary and sets the attributes of the leaf node using the provided key-value pairs.

translate_sequences_on_tree

Description

Translates nucleotide sequences to amino acid sequences for all nodes in the tree.

Inputs

NameTypeDescription
TBio.Phylo.BaseTree.TreeThe phylogenetic tree object.
cdsdictA dictionary specifying the coding sequence region with keys: ‘begin’ (start position), ‘pad’ (length of padding with ‘X’ at the beginning), and ‘end’ (end position).

Internal Logic

  1. Iterates through all nodes (internal and terminal) in the tree.
  2. Extracts the nucleotide sequence corresponding to the coding region specified by cds.
  3. Translates the nucleotide sequence to an amino acid sequence, handling potential translation errors by inserting ‘X’ characters.
  4. Stores the translated amino acid sequence in the aa_seq attribute of each node.

mutations_on_branches

Description

Identifies and annotates mutations occurring along each branch of the tree.

Inputs

NameTypeDescription
TBio.Phylo.BaseTree.TreeThe phylogenetic tree object.
aaboolFlag indicating whether to analyze mutations at the amino acid level (default: False).

Internal Logic

  1. Initializes an empty list of mutations for the root node.
  2. Calls the recursive function mutations_on_branches_subtree to traverse the tree.
  3. For each branch, compares the sequences of the parent and child nodes to identify mutations.
  4. Stores the identified mutations in the mutations (and aa_mutations if aa is True) attribute of the child node.

mutations_on_branches_subtree

Description

Recursive helper function for mutations_on_branches that traverses the tree and identifies mutations along each branch.

Inputs

NameTypeDescription
subtreeBio.Phylo.Clade.CladeThe current subtree being processed.
aaboolFlag indicating whether to analyze mutations at the amino acid level.

Internal Logic

  1. Iterates through the child nodes of the current subtree.
  2. Compares the nucleotide sequences of the parent and child nodes to identify mutations.
  3. If aa is True, also compares the amino acid sequences.
  4. Stores the identified mutations in the mutations and aa_mutations attributes of the child node.
  5. Recursively calls itself for each child node that is not a terminal node.

find_internal_nodes

Description

Finds corresponding internal nodes between two trees (sourceT and destT) based on the most recent common ancestor (MRCA) of their leaf nodes.

Inputs

NameTypeDescription
sourceTBio.Phylo.BaseTree.TreeThe source tree.
destTBio.Phylo.BaseTree.TreeThe destination tree.

Internal Logic

  1. Creates a lookup table for leaf nodes in destT.
  2. Calculates the path from the root to each leaf node in sourceT within the destT tree.
  3. Iterates through internal nodes in sourceT.
  4. For each internal node, finds the MRCA of its leaf nodes in destT using the calculated paths.
  5. Links the internal nodes in sourceT and destT by setting their mirror_node attributes.

label_nodes

Description

Sets labels for specific nodes in the tree based on a provided dictionary.

Inputs

NameTypeDescription
TBio.Phylo.BaseTree.TreeThe phylogenetic tree object.
seqs_to_labeldictA dictionary mapping node names to their desired labels.

Internal Logic

  1. Clears any existing labels for all nodes.
  2. Iterates through nodes and sets labels based on the provided seqs_to_label dictionary.

node_label_func

Description

A simple function that attempts to return the name attribute of a tree node. Used as a default node label function for tree plotting.

Inputs

NameTypeDescription
nodeBio.Phylo.Clade.CladeThe tree node.

Outputs

NameTypeDescription
labelstrThe name attribute of the node, or None if the attribute is not found.

erase_color

Description

Resets the color of all nodes in the tree to None.

Inputs

NameTypeDescription
treeBio.Phylo.BaseTree.TreeThe phylogenetic tree object.

Internal Logic

Iterates through all nodes and sets the color attribute to None.

plot_prediction_tree

Description

Visualizes a phylogenetic tree with nodes colored according to a prediction.

Inputs

NameTypeDescription
predictionobjectAn object containing prediction data and methods for coloring the tree.
methodstrThe method used for coloring the tree (default: ‘mean_fitness’).
internalboolFlag indicating whether to color internal nodes (default: False).
figmatplotlib.figure.FigureOptional matplotlib figure object.
axesmatplotlib.axes.AxesOptional matplotlib axes object.
cbboolFlag indicating whether to display a colorbar (default: True).
scalebarboolFlag indicating whether to display a scale bar (default: True).
offsetfloatOffset value for color scaling (default: 0.0001).
node_label_funcfunctionFunction to generate node labels (default: a function that returns None).

Internal Logic

  1. Colors the tree using the provided prediction object and specified method.
  2. Draws the tree using the draw_tree function.

draw_tree

Description

Draws a phylogenetic tree using matplotlib.

Inputs

NameTypeDescription
treeBio.Phylo.BaseTree.TreeThe phylogenetic tree object.
node_labelfunctionFunction to generate node labels (default: node_label_func).
branch_labelfunctionFunction to generate branch labels (default: node_label_func).
cmapmatplotlib.colors.ColormapThe colormap to use for coloring branches (default: cm.jet).
figmatplotlib.figure.FigureOptional matplotlib figure object.
axesmatplotlib.axes.AxesOptional matplotlib axes object.
cbboolFlag indicating whether to display a colorbar (default: True).
scalebarboolFlag indicating whether to display a scale bar (default: True).

Internal Logic

  1. Creates a matplotlib figure and axes if not provided.
  2. Uses Biopython’s Phylo.draw function to render the tree.
  3. Adds a scale bar and colorbar if enabled.

plot_combined_tree

Description

Plots a combined tree with nodes colored based on predictions from a separate dataset.

Inputs

NameTypeDescription
predictionobjectAn object containing prediction data and methods for coloring the tree.
combined_dataobjectAn object containing the combined tree and data.
methodstrThe method used for coloring the tree (default: ‘mean_fitness’).
internalboolFlag indicating whether to color internal nodes (default: False).
axesmatplotlib.axes.AxesOptional matplotlib axes object.
cbboolFlag indicating whether to display a colorbar (default: True).
offsetfloatOffset value for color scaling (default: 0.0001).

Internal Logic

  1. Colors the combined tree using predictions from the prediction object.
  2. Draws the tree using the draw_tree function.

Dependencies

DependencyPurpose
biopythonHandling sequences, alignments, and phylogenetic trees.
matplotlibVisualization of phylogenetic trees.

TODOs

There are no TODOs or other notes left in the code.