tree_utils.py
High-level description
The tree_utils.py
file provides a collection of functions for building, manipulating, annotating, and visualizing phylogenetic trees. These functions are primarily used for analyzing and understanding the evolutionary relationships between sequences, particularly in the context of fitness prediction.
Code Structure
This file defines a set of independent utility functions that operate on phylogenetic trees represented using the Biopython library’s Phylo
objects. Some functions utilize external tools like fasttree
for tree building.
References
This code references the Biopython library (Bio
) extensively for handling sequences, alignments, and phylogenetic trees. It also uses the matplotlib
library for visualization purposes.
Symbols
calculate_tree
Description
Builds a phylogenetic tree from a given sequence alignment and outgroup sequence using the fasttree
program. It infers ancestral sequences and performs basic tree manipulation like rooting and ladderizing.
Inputs
Name | Type | Description |
---|---|---|
aln | Bio.Align.MultipleSeqAlignment | A Biopython alignment object containing the sequences. |
outgroup | Bio.SeqRecord.SeqRecord | A Biopython SeqRecord object representing the outgroup sequence. |
ancestral | bool | Flag indicating whether to infer ancestral sequences (default: True). |
Outputs
Name | Type | Description |
---|---|---|
biopython_tree | Bio.Phylo.BaseTree.Tree | A rooted and ladderized Biopython tree object constructed from the input alignment. |
Internal Logic
- Writes the alignment and outgroup sequence to a temporary FASTA file.
- Executes
fasttree
with the temporary file as input. - Parses the
fasttree
output into a BiopythonPhylo.Tree
object. - Roots the tree using the provided outgroup.
- Ladderizes the tree for visual clarity.
- If
ancestral
is True, infers ancestral sequences for internal nodes. - Removes the temporary file.
branch_label
Description
Generates a label for a tree branch based on the mutations occurring along that branch.
Inputs
Name | Type | Description |
---|---|---|
node | Bio.Phylo.Clade.Clade | The tree node representing the end of the branch to be labeled. |
aa | bool | Flag indicating whether to use amino acid or nucleotide sequences for labeling (default: True). |
display_positions | list | Optional list of positions to consider for labeling. If provided, only mutations at these positions will be included in the label. |
Outputs
Name | Type | Description |
---|---|---|
label | str | A string representing the mutations along the branch, formatted as a series of underscore-separated mutation descriptions (e.g., “A123T_G456C”). |
Internal Logic
- Determines whether to use amino acid (
node.aa_mutations
) or nucleotide (node.mutations
) mutations based on theaa
flag. - If
display_positions
is provided, filters the mutations to include only those occurring at the specified positions. - Formats the selected mutations into a string representation.
collapse_zero_branches
Description
Collapses branches in the tree where the sequences at the parent and child nodes are identical. This simplifies the tree by merging nodes with no evolutionary changes.
Inputs
Name | Type | Description |
---|---|---|
clade | Bio.Phylo.Clade.Clade | The root node of the subtree to process. |
Internal Logic
- Recursively traverses the tree.
- For each internal node, compares its sequence to its children’s sequences.
- If a child node has the same sequence as its parent, collapses the child into the parent.
annotate_leaf
Description
Adds annotations to a leaf node in the tree.
Inputs
Name | Type | Description |
---|---|---|
leaf | Bio.Phylo.Clade.Clade | The leaf node to annotate. |
annotation | dict | A dictionary where keys represent annotation attributes and values are the corresponding values to be assigned to the leaf node. |
Internal Logic
Iterates through the annotation
dictionary and sets the attributes of the leaf
node using the provided key-value pairs.
translate_sequences_on_tree
Description
Translates nucleotide sequences to amino acid sequences for all nodes in the tree.
Inputs
Name | Type | Description |
---|---|---|
T | Bio.Phylo.BaseTree.Tree | The phylogenetic tree object. |
cds | dict | A dictionary specifying the coding sequence region with keys: ‘begin’ (start position), ‘pad’ (length of padding with ‘X’ at the beginning), and ‘end’ (end position). |
Internal Logic
- Iterates through all nodes (internal and terminal) in the tree.
- Extracts the nucleotide sequence corresponding to the coding region specified by
cds
. - Translates the nucleotide sequence to an amino acid sequence, handling potential translation errors by inserting ‘X’ characters.
- Stores the translated amino acid sequence in the
aa_seq
attribute of each node.
mutations_on_branches
Description
Identifies and annotates mutations occurring along each branch of the tree.
Inputs
Name | Type | Description |
---|---|---|
T | Bio.Phylo.BaseTree.Tree | The phylogenetic tree object. |
aa | bool | Flag indicating whether to analyze mutations at the amino acid level (default: False). |
Internal Logic
- Initializes an empty list of mutations for the root node.
- Calls the recursive function
mutations_on_branches_subtree
to traverse the tree. - For each branch, compares the sequences of the parent and child nodes to identify mutations.
- Stores the identified mutations in the
mutations
(andaa_mutations
ifaa
is True) attribute of the child node.
mutations_on_branches_subtree
Description
Recursive helper function for mutations_on_branches
that traverses the tree and identifies mutations along each branch.
Inputs
Name | Type | Description |
---|---|---|
subtree | Bio.Phylo.Clade.Clade | The current subtree being processed. |
aa | bool | Flag indicating whether to analyze mutations at the amino acid level. |
Internal Logic
- Iterates through the child nodes of the current subtree.
- Compares the nucleotide sequences of the parent and child nodes to identify mutations.
- If
aa
is True, also compares the amino acid sequences. - Stores the identified mutations in the
mutations
andaa_mutations
attributes of the child node. - Recursively calls itself for each child node that is not a terminal node.
find_internal_nodes
Description
Finds corresponding internal nodes between two trees (sourceT
and destT
) based on the most recent common ancestor (MRCA) of their leaf nodes.
Inputs
Name | Type | Description |
---|---|---|
sourceT | Bio.Phylo.BaseTree.Tree | The source tree. |
destT | Bio.Phylo.BaseTree.Tree | The destination tree. |
Internal Logic
- Creates a lookup table for leaf nodes in
destT
. - Calculates the path from the root to each leaf node in
sourceT
within thedestT
tree. - Iterates through internal nodes in
sourceT
. - For each internal node, finds the MRCA of its leaf nodes in
destT
using the calculated paths. - Links the internal nodes in
sourceT
anddestT
by setting theirmirror_node
attributes.
label_nodes
Description
Sets labels for specific nodes in the tree based on a provided dictionary.
Inputs
Name | Type | Description |
---|---|---|
T | Bio.Phylo.BaseTree.Tree | The phylogenetic tree object. |
seqs_to_label | dict | A dictionary mapping node names to their desired labels. |
Internal Logic
- Clears any existing labels for all nodes.
- Iterates through nodes and sets labels based on the provided
seqs_to_label
dictionary.
node_label_func
Description
A simple function that attempts to return the name
attribute of a tree node. Used as a default node label function for tree plotting.
Inputs
Name | Type | Description |
---|---|---|
node | Bio.Phylo.Clade.Clade | The tree node. |
Outputs
Name | Type | Description |
---|---|---|
label | str | The name attribute of the node, or None if the attribute is not found. |
erase_color
Description
Resets the color of all nodes in the tree to None.
Inputs
Name | Type | Description |
---|---|---|
tree | Bio.Phylo.BaseTree.Tree | The phylogenetic tree object. |
Internal Logic
Iterates through all nodes and sets the color
attribute to None.
plot_prediction_tree
Description
Visualizes a phylogenetic tree with nodes colored according to a prediction.
Inputs
Name | Type | Description |
---|---|---|
prediction | object | An object containing prediction data and methods for coloring the tree. |
method | str | The method used for coloring the tree (default: ‘mean_fitness’). |
internal | bool | Flag indicating whether to color internal nodes (default: False). |
fig | matplotlib.figure.Figure | Optional matplotlib figure object. |
axes | matplotlib.axes.Axes | Optional matplotlib axes object. |
cb | bool | Flag indicating whether to display a colorbar (default: True). |
scalebar | bool | Flag indicating whether to display a scale bar (default: True). |
offset | float | Offset value for color scaling (default: 0.0001). |
node_label_func | function | Function to generate node labels (default: a function that returns None). |
Internal Logic
- Colors the tree using the provided
prediction
object and specifiedmethod
. - Draws the tree using the
draw_tree
function.
draw_tree
Description
Draws a phylogenetic tree using matplotlib.
Inputs
Name | Type | Description |
---|---|---|
tree | Bio.Phylo.BaseTree.Tree | The phylogenetic tree object. |
node_label | function | Function to generate node labels (default: node_label_func ). |
branch_label | function | Function to generate branch labels (default: node_label_func ). |
cmap | matplotlib.colors.Colormap | The colormap to use for coloring branches (default: cm.jet ). |
fig | matplotlib.figure.Figure | Optional matplotlib figure object. |
axes | matplotlib.axes.Axes | Optional matplotlib axes object. |
cb | bool | Flag indicating whether to display a colorbar (default: True). |
scalebar | bool | Flag indicating whether to display a scale bar (default: True). |
Internal Logic
- Creates a matplotlib figure and axes if not provided.
- Uses Biopython’s
Phylo.draw
function to render the tree. - Adds a scale bar and colorbar if enabled.
plot_combined_tree
Description
Plots a combined tree with nodes colored based on predictions from a separate dataset.
Inputs
Name | Type | Description |
---|---|---|
prediction | object | An object containing prediction data and methods for coloring the tree. |
combined_data | object | An object containing the combined tree and data. |
method | str | The method used for coloring the tree (default: ‘mean_fitness’). |
internal | bool | Flag indicating whether to color internal nodes (default: False). |
axes | matplotlib.axes.Axes | Optional matplotlib axes object. |
cb | bool | Flag indicating whether to display a colorbar (default: True). |
offset | float | Offset value for color scaling (default: 0.0001). |
Internal Logic
- Colors the combined tree using predictions from the
prediction
object. - Draws the tree using the
draw_tree
function.
Dependencies
Dependency | Purpose |
---|---|
biopython | Handling sequences, alignments, and phylogenetic trees. |
matplotlib | Visualization of phylogenetic trees. |
TODOs
There are no TODOs or other notes left in the code.