High-level description

This file provides utility functions for generating visualizations of phylogenetic trees using the iTOL (Interactive Tree Of Life) web interface. It includes functions for uploading trees and associated data to iTOL, exporting the visualizations, and creating various data files that iTOL uses for customization, such as gradient files, colorbars, and indel heatmaps.

Code Structure

The code consists of several functions that are used to create files for iTOL and one main function, upload_and_export_itol, which orchestrates the process of uploading a tree and associated data to iTOL and exporting the visualization. The other functions are called by upload_and_export_itol to create the necessary data files.

References

This code references the CassiopeiaTree class from cassiopeia.data and the Itol and ItolExport classes from the itolapi package. It also uses utility functions from cassiopeia.plotting.utilities for color manipulation and data preparation.

Symbols

upload_and_export_itol

Description

Uploads a phylogenetic tree to iTOL and exports the visualization to a local file. This function handles the entire process of preparing the tree and associated data for iTOL, uploading it, and exporting the visualization in the desired format.

Inputs

NameTypeDescription
cassiopeia_treeCassiopeiaTreeA CassiopeiaTree object containing the phylogenetic tree.
tree_namestrName of the tree to be used in iTOL.
export_filepathstrPath to the file where the exported visualization will be saved.
itol_configstrPath to the iTOL configuration file containing API key and project name. Defaults to ”~/.itolconfig”.
api_keyOptional[str]iTOL API key. If not provided, it will be read from the itol_config file.
project_nameOptional[str]iTOL project name. If not provided, it will be read from the itol_config file.
meta_dataList[str]List of cell metadata columns to be plotted alongside the tree.
allele_tableOptional[pd.DataFrame]Allele table to be plotted as an indel heatmap.
indel_colorsOptional[pd.DataFrame]Color mapping for indels in the heatmap.
indel_priorsOptional[pd.DataFrame]Prior probabilities for each indel, used for color mapping if indel_colors is not provided.
rectboolWhether to export the tree as a rectangle (True) or a circle (False). Defaults to False.
include_legendboolWhether to include a legend for the colorbar. Defaults to False.
use_branch_lengthsboolWhether to use branch lengths when exporting the tree. Defaults to False.
paletteList[str]List of colors in hex format to be used for categorical metadata. Defaults to bokeh.palettes.Category20[20].
random_stateOptional[np.random.RandomState]Random state for reproducibility.
user_dataset_filesOptional[List[str]]List of paths to additional dataset files to upload.
verboseboolWhether to print verbose output. Defaults to True.
**kwargsdictAdditional keyword arguments passed as iTOL export parameters.

Outputs

This function does not return any values. It uploads the tree and associated data to iTOL and exports the visualization to the specified file.

Internal Logic

  1. Credentials: The function first checks for iTOL API key and project name, either provided as arguments or read from the itol_config file.
  2. Temporary Directory: A temporary directory is created to store the files that will be uploaded to iTOL.
  3. Tree File: The tree is written to a Newick format file in the temporary directory.
  4. Data Files: Additional data files are created based on the provided arguments:
    • Indel Heatmap: If allele_table is provided, a set of files for the indel heatmap is created using create_indel_heatmap.
    • Metadata: For each metadata column in meta_data, a gradient file or a colorbar file is created depending on the data type, using create_gradient_from_df or create_colorbar respectively.
  5. Upload: The tree file and all data files are uploaded to iTOL using the itolapi package.
  6. Export: The visualization is exported to the specified file using the itolapi package, with the specified format and parameters.
  7. Cleanup: The temporary directory is removed.

Side Effects

  • Creates a temporary directory and files within it.
  • Uploads data to the user’s iTOL account.
  • Creates a local file containing the exported visualization.

create_gradient_from_df

Description

Creates a gradient file for iTOL from a pandas Series containing numerical data. The gradient file defines a color gradient that iTOL uses to color leaves based on the numerical values.

Inputs

NameTypeDescription
dfpd.SeriesPandas Series containing numerical data.
treeCassiopeiaTreeCassiopeiaTree object.
dataset_namestrName of the dataset.
output_directorystrDirectory where the gradient file will be saved. Defaults to “./tmp/“.
color_minstrHex color code for the minimum value. Defaults to “#ffffff”.
color_maxstrHex color code for the maximum value. Defaults to “#000000”.

Outputs

NameTypeDescription
outfpstrPath to the created gradient file.

Internal Logic

  1. Leaf Data: The function extracts the numerical data for the leaves of the tree from the input Series.
  2. File Content: It creates a pandas DataFrame with two columns: “cellBC” (leaf names) and “gradient” (numerical values).
  3. Header: It constructs a header for the gradient file according to the iTOL format.
  4. File Writing: It writes the header, followed by the DataFrame content in tab-separated format, to the output file.

create_colorbar

Description

Creates a colorbar file for iTOL from a pandas Series containing categorical data. The colorbar file defines a color mapping for each unique category, which iTOL uses to color leaves.

Inputs

NameTypeDescription
labelspd.DataFramePandas Series containing categorical data.
treeCassiopeiaTreeCassiopeiaTree object.
colormapDict[str, Tuple[int, int, int]]Dictionary mapping categories to RGB color tuples.
dataset_namestrName of the dataset.
output_directorystrDirectory where the colorbar file will be saved. Defaults to “.tmp/“.
create_legendboolWhether to include a legend for the colorbar in the iTOL visualization. Defaults to False.

Outputs

NameTypeDescription
outfpstrPath to the created colorbar file.

Internal Logic

  1. Leaf Data: The function extracts the categorical data for the leaves of the tree from the input Series.
  2. Color Mapping: It converts the RGB color tuples in the colormap to iTOL’s “rgb(R,G,B)” format.
  3. File Content: It creates a pandas DataFrame with two columns: “cellBC” (leaf names) and “color” (color strings).
  4. Header: It constructs a header for the colorbar file according to the iTOL format.
  5. Legend: If create_legend is True, it adds legend information to the header, including legend title, shapes, colors, and labels.
  6. File Writing: It writes the header, followed by the DataFrame content in tab-separated format, to the output file.

create_indel_heatmap

Description

Creates a set of files for displaying an indel heatmap alongside the tree in iTOL. Each file corresponds to a character (cut site) in the allele table and defines a color mapping for the indels observed at that site.

Inputs

NameTypeDescription
alleletablepd.DataFrameAllele table containing indel information.
treeCassiopeiaTreeCassiopeiaTree object.
dataset_namestrName of the dataset.
output_directorystrDirectory where the heatmap files will be saved.
indel_colorsOptional[pd.DataFrame]DataFrame mapping indels to HSV color tuples. If not provided, random colors are generated.
indel_priorsOptional[pd.DataFrame]DataFrame containing prior probabilities for each indel. Used for color mapping if indel_colors is not provided.
random_stateOptional[np.random.RandomState]Random state for reproducibility.

Outputs

NameTypeDescription
allele_filesList[str]List of paths to the created heatmap files.

Internal Logic

  1. Lineage Profile: The function converts the allele table to a lineage profile using utilities.prepare_alleletable.
  2. Color Mapping: If indel_colors is not provided, it generates random colors for each unique indel using utilities.get_random_indel_colors or utilities.get_indel_colors based on indel_priors.
  3. Heatmap Data: It creates a numpy array representing the heatmap, where each row corresponds to a leaf and each column corresponds to a character. The array values are RGB color tuples based on the indel observed at each position.
  4. File Writing: For each character, it creates a colorbar file similar to create_colorbar, using the corresponding column from the heatmap array as the color data. The file names are formatted as “indelColors_0.txt”.

Dependencies

This code depends on the following external libraries:

DependencyPurpose
itolapiInteracting with the iTOL API for uploading and exporting trees.
bokehProviding color palettes for categorical metadata.
matplotlibColor manipulation and conversion between color spaces.
numpyNumerical operations and array manipulation.
pandasData manipulation and handling dataframes.
networkxWorking with tree structures and graph algorithms.
tqdmDisplaying progress bars for long-running operations.

Error Handling

The code raises iTOLError exceptions for various error conditions, such as invalid iTOL credentials, unsupported output formats, missing metadata, or errors encountered during interaction with the iTOL API.