itol_utilities.py
High-level description
This file provides utility functions for generating visualizations of phylogenetic trees using the iTOL (Interactive Tree Of Life) web interface. It includes functions for uploading trees and associated data to iTOL, exporting the visualizations, and creating various data files that iTOL uses for customization, such as gradient files, colorbars, and indel heatmaps.
Code Structure
The code consists of several functions that are used to create files for iTOL and one main function, upload_and_export_itol
, which orchestrates the process of uploading a tree and associated data to iTOL and exporting the visualization. The other functions are called by upload_and_export_itol
to create the necessary data files.
References
This code references the CassiopeiaTree
class from cassiopeia.data
and the Itol
and ItolExport
classes from the itolapi
package. It also uses utility functions from cassiopeia.plotting.utilities
for color manipulation and data preparation.
Symbols
upload_and_export_itol
Description
Uploads a phylogenetic tree to iTOL and exports the visualization to a local file. This function handles the entire process of preparing the tree and associated data for iTOL, uploading it, and exporting the visualization in the desired format.
Inputs
Name | Type | Description |
---|---|---|
cassiopeia_tree | CassiopeiaTree | A CassiopeiaTree object containing the phylogenetic tree. |
tree_name | str | Name of the tree to be used in iTOL. |
export_filepath | str | Path to the file where the exported visualization will be saved. |
itol_config | str | Path to the iTOL configuration file containing API key and project name. Defaults to ”~/.itolconfig”. |
api_key | Optional[str] | iTOL API key. If not provided, it will be read from the itol_config file. |
project_name | Optional[str] | iTOL project name. If not provided, it will be read from the itol_config file. |
meta_data | List[str] | List of cell metadata columns to be plotted alongside the tree. |
allele_table | Optional[pd.DataFrame] | Allele table to be plotted as an indel heatmap. |
indel_colors | Optional[pd.DataFrame] | Color mapping for indels in the heatmap. |
indel_priors | Optional[pd.DataFrame] | Prior probabilities for each indel, used for color mapping if indel_colors is not provided. |
rect | bool | Whether to export the tree as a rectangle (True) or a circle (False). Defaults to False. |
include_legend | bool | Whether to include a legend for the colorbar. Defaults to False. |
use_branch_lengths | bool | Whether to use branch lengths when exporting the tree. Defaults to False. |
palette | List[str] | List of colors in hex format to be used for categorical metadata. Defaults to bokeh.palettes.Category20[20] . |
random_state | Optional[np.random.RandomState] | Random state for reproducibility. |
user_dataset_files | Optional[List[str]] | List of paths to additional dataset files to upload. |
verbose | bool | Whether to print verbose output. Defaults to True. |
**kwargs | dict | Additional keyword arguments passed as iTOL export parameters. |
Outputs
This function does not return any values. It uploads the tree and associated data to iTOL and exports the visualization to the specified file.
Internal Logic
- Credentials: The function first checks for iTOL API key and project name, either provided as arguments or read from the
itol_config
file. - Temporary Directory: A temporary directory is created to store the files that will be uploaded to iTOL.
- Tree File: The tree is written to a Newick format file in the temporary directory.
- Data Files: Additional data files are created based on the provided arguments:
- Indel Heatmap: If
allele_table
is provided, a set of files for the indel heatmap is created usingcreate_indel_heatmap
. - Metadata: For each metadata column in
meta_data
, a gradient file or a colorbar file is created depending on the data type, usingcreate_gradient_from_df
orcreate_colorbar
respectively.
- Indel Heatmap: If
- Upload: The tree file and all data files are uploaded to iTOL using the
itolapi
package. - Export: The visualization is exported to the specified file using the
itolapi
package, with the specified format and parameters. - Cleanup: The temporary directory is removed.
Side Effects
- Creates a temporary directory and files within it.
- Uploads data to the user’s iTOL account.
- Creates a local file containing the exported visualization.
create_gradient_from_df
Description
Creates a gradient file for iTOL from a pandas Series containing numerical data. The gradient file defines a color gradient that iTOL uses to color leaves based on the numerical values.
Inputs
Name | Type | Description |
---|---|---|
df | pd.Series | Pandas Series containing numerical data. |
tree | CassiopeiaTree | CassiopeiaTree object. |
dataset_name | str | Name of the dataset. |
output_directory | str | Directory where the gradient file will be saved. Defaults to ”./tmp/“. |
color_min | str | Hex color code for the minimum value. Defaults to “#ffffff”. |
color_max | str | Hex color code for the maximum value. Defaults to “#000000”. |
Outputs
Name | Type | Description |
---|---|---|
outfp | str | Path to the created gradient file. |
Internal Logic
- Leaf Data: The function extracts the numerical data for the leaves of the tree from the input Series.
- File Content: It creates a pandas DataFrame with two columns: “cellBC” (leaf names) and “gradient” (numerical values).
- Header: It constructs a header for the gradient file according to the iTOL format.
- File Writing: It writes the header, followed by the DataFrame content in tab-separated format, to the output file.
create_colorbar
Description
Creates a colorbar file for iTOL from a pandas Series containing categorical data. The colorbar file defines a color mapping for each unique category, which iTOL uses to color leaves.
Inputs
Name | Type | Description |
---|---|---|
labels | pd.DataFrame | Pandas Series containing categorical data. |
tree | CassiopeiaTree | CassiopeiaTree object. |
colormap | Dict[str, Tuple[int, int, int]] | Dictionary mapping categories to RGB color tuples. |
dataset_name | str | Name of the dataset. |
output_directory | str | Directory where the colorbar file will be saved. Defaults to “.tmp/“. |
create_legend | bool | Whether to include a legend for the colorbar in the iTOL visualization. Defaults to False. |
Outputs
Name | Type | Description |
---|---|---|
outfp | str | Path to the created colorbar file. |
Internal Logic
- Leaf Data: The function extracts the categorical data for the leaves of the tree from the input Series.
- Color Mapping: It converts the RGB color tuples in the
colormap
to iTOL’s “rgb(R,G,B)” format. - File Content: It creates a pandas DataFrame with two columns: “cellBC” (leaf names) and “color” (color strings).
- Header: It constructs a header for the colorbar file according to the iTOL format.
- Legend: If
create_legend
is True, it adds legend information to the header, including legend title, shapes, colors, and labels. - File Writing: It writes the header, followed by the DataFrame content in tab-separated format, to the output file.
create_indel_heatmap
Description
Creates a set of files for displaying an indel heatmap alongside the tree in iTOL. Each file corresponds to a character (cut site) in the allele table and defines a color mapping for the indels observed at that site.
Inputs
Name | Type | Description |
---|---|---|
alleletable | pd.DataFrame | Allele table containing indel information. |
tree | CassiopeiaTree | CassiopeiaTree object. |
dataset_name | str | Name of the dataset. |
output_directory | str | Directory where the heatmap files will be saved. |
indel_colors | Optional[pd.DataFrame] | DataFrame mapping indels to HSV color tuples. If not provided, random colors are generated. |
indel_priors | Optional[pd.DataFrame] | DataFrame containing prior probabilities for each indel. Used for color mapping if indel_colors is not provided. |
random_state | Optional[np.random.RandomState] | Random state for reproducibility. |
Outputs
Name | Type | Description |
---|---|---|
allele_files | List[str] | List of paths to the created heatmap files. |
Internal Logic
- Lineage Profile: The function converts the allele table to a lineage profile using
utilities.prepare_alleletable
. - Color Mapping: If
indel_colors
is not provided, it generates random colors for each unique indel usingutilities.get_random_indel_colors
orutilities.get_indel_colors
based onindel_priors
. - Heatmap Data: It creates a numpy array representing the heatmap, where each row corresponds to a leaf and each column corresponds to a character. The array values are RGB color tuples based on the indel observed at each position.
- File Writing: For each character, it creates a colorbar file similar to
create_colorbar
, using the corresponding column from the heatmap array as the color data. The file names are formatted as “indelColors_0.txt”.
Dependencies
This code depends on the following external libraries:
Dependency | Purpose |
---|---|
itolapi | Interacting with the iTOL API for uploading and exporting trees. |
bokeh | Providing color palettes for categorical metadata. |
matplotlib | Color manipulation and conversion between color spaces. |
numpy | Numerical operations and array manipulation. |
pandas | Data manipulation and handling dataframes. |
networkx | Working with tree structures and graph algorithms. |
tqdm | Displaying progress bars for long-running operations. |
Error Handling
The code raises iTOLError
exceptions for various error conditions, such as invalid iTOL credentials, unsupported output formats, missing metadata, or errors encountered during interaction with the iTOL API.