solver_utilities.py - Stanza Demo

High-level description

This file provides utility functions for the solver module in Cassiopeia. These functions are used for tasks such as generating unique node names, collapsing unifurcations in a tree, transforming prior probabilities into weights, and converting sample names to indices.

Code Structure

The file contains several independent functions that are used by other modules in the solver package. These functions are not directly interconnected but serve as helper functions for various solver algorithms.

References

This file references the ete3, numpy, pandas, and time libraries. It also references the PriorTransformationError exception from the cassiopeia.mixins module.

Symbols

`node_name_generator`

Description

This function creates a generator that yields unique node names by hashing timestamps. This is used to ensure that each node in the reconstructed tree has a unique identifier.

Inputs

None

Outputs

Name	Type	Description
generator	Generator[str, None, None]	A generator object that yields unique node names

Internal Logic

The function uses the blake2b hash function from the hashlib library to generate a 12-byte hash of the current timestamp. This hash is then converted to a hexadecimal string and appended to the prefix “cassiopeia_internal_node” to create a unique node name.

`collapse_unifurcations`

Description

This function collapses all unifurcations in a given ete3.Tree. Unifurcations are nodes with only one child. Collapsing them simplifies the tree structure by removing unnecessary nodes.

Inputs

Name	Type	Description
tree	ete3.Tree	The tree to collapse unifurcations in

Outputs

Name	Type	Description
collapsed_tree	ete3.Tree	A copy of the input tree with all unifurcations collapsed

Internal Logic

The function iterates through all nodes in the tree and identifies those with only one child. For each unifurcation, it removes the node and connects its child directly to its parent.

`transform_priors`

Description

This function transforms prior probabilities into weights for use in greedy solver algorithms. It supports three transformations: negative log, inverse, and square root inverse.

Inputs

Name	Type	Description
priors	Optional[Dict[int, Dict[int, float]]]	A dictionary of prior probabilities for each character/state pair
prior_transformation	str	The transformation to apply to the priors. Defaults to “negative_log”

Outputs

Name	Type	Description
weights	Dict[int, Dict[int, float]]	A dictionary of weights for each character/state pair

Internal Logic

The function first checks if the specified transformation is supported. Then, it applies the chosen transformation to each prior probability and stores the resulting weight in a new dictionary.

`convert_sample_names_to_indices`

Description

This function maps sample names to their corresponding integer indices in a given list of names. This is used for efficient indexing operations in the solver algorithms.

Inputs

Name	Type	Description
names	List[str]	A list of sample names
samples	List[str]	A subset of sample names to be mapped to indices

Outputs

Name	Type	Description
indices	List[int]	A list of integer indices corresponding to the input samples

Internal Logic

The function creates a dictionary mapping each sample name to its index in the names list. Then, it uses this dictionary to map the input samples to their corresponding indices.

`save_dissimilarity_as_phylip`

Description

This function saves a dissimilarity map as a PHYLIP file. The PHYLIP format is a standard format for representing phylogenetic trees and distance matrices.

Inputs

Name	Type	Description
dissimilarity_map	pd.DataFrame	A dissimilarity map
path	str	The path to save the PHYLIP file

Outputs

None

Internal Logic

The function converts the dissimilarity map to a NumPy array and writes it to a file in the PHYLIP format. The file includes the number of samples and a lower triangular matrix of dissimilarities.

Project Root

​High-level description

​Code Structure

​References

​Symbols

​node_name_generator

​Description

​Inputs

​Outputs

​Internal Logic

​collapse_unifurcations

​Description

​Inputs

​Outputs

​Internal Logic

​transform_priors

​Description

​Inputs

​Outputs

​Internal Logic

​convert_sample_names_to_indices

​Description

​Inputs

​Outputs

​Internal Logic

​save_dissimilarity_as_phylip

​Description

​Inputs

​Outputs

​Internal Logic

High-level description

Code Structure

References

Symbols

`node_name_generator`

Description

Inputs

Outputs

Internal Logic

`collapse_unifurcations`

Description

Inputs

Outputs

Internal Logic

`transform_priors`

Description

Inputs

Outputs

Internal Logic

`convert_sample_names_to_indices`

Description

Inputs

Outputs

Internal Logic

`save_dissimilarity_as_phylip`

Description

Inputs

Outputs

Internal Logic