High-level description

This file contains a set of utility functions for working with character matrices and trees in Cassiopeia. These functions include importing modules, checking for ambiguous states, unraveling ambiguous states, finding duplicate groups in a character matrix, and other helper functions.

Code Structure

The code consists of a set of independent utility functions that can be used in various parts of the Cassiopeia codebase. Some functions are used by other functions, such as is_ambiguous_state being used by unravel_ambiguous_states and find_duplicate_groups.

References

This file references the ngs_tools and functools libraries. It also references the is_ambiguous_state function from the cassiopeia.mixins module.

Symbols

is_ambiguous_state

Description

Determines if a given state is ambiguous. An ambiguous state is represented as a tuple of integers.

Inputs

NameTypeDescription
stateUnion[int, Tuple[int, …]]A single character state, which can be an integer or a tuple of integers.

Outputs

NameTypeDescription
boolTrue if the state is ambiguous (i.e., a tuple), False otherwise.

Internal Logic

The function simply checks if the input state is an instance of a tuple. If it is, it returns True, indicating an ambiguous state. Otherwise, it returns False.

try_import

Description

Attempts to import a module and returns it if successful, otherwise returns None. This is useful for handling optional dependencies.

Inputs

NameTypeDescription
modulestrThe name of the module to import.

Outputs

NameTypeDescription
Optional[ModuleType]The imported module if successful, otherwise None.

Internal Logic

The function uses a try-except block to attempt importing the module using importlib.import_module. If a ModuleNotFoundError is raised, it returns None. Otherwise, it returns the imported module.

unravel_ambiguous_states

Description

Unravels a list of potentially ambiguous states into a list of unique states.

Inputs

NameTypeDescription
state_arrayList[Union[int, Tuple[int, …]]]A list of character states, potentially containing ambiguous states represented as tuples.

Outputs

NameTypeDescription
List[int]A list of unique integer states present in the input state_array.

Internal Logic

The function iterates through the state_array. For each state, if it’s ambiguous (a tuple), it converts it to a list. Otherwise, it creates a list containing the single state. Finally, it uses functools.reduce to concatenate all the lists into a single list of unique states.

find_duplicate_groups

Description

Identifies groups of samples in a character matrix that have identical character states and maps them to a dictionary.

Inputs

NameTypeDescription
character_matrixpd.DataFrameA character matrix potentially containing ambiguous states.

Outputs

NameTypeDescription
Dict[str, Tuple[str, …]]A dictionary mapping a single sample name to a tuple of sample names that share the same character states.

Internal Logic

  1. Sets the index name of the character matrix to “index”.
  2. Creates a copy of the character matrix and converts each element to a set to handle ambiguous states.
  3. Identifies duplicated rows in the converted matrix using pd.DataFrame.duplicated.
  4. Finds unique duplicated states.
  5. Groups sample names based on unique duplicated states.
  6. Creates a dictionary mapping the first sample name in each group to a tuple of all sample names in that group.

Dependencies

This file depends on the following external libraries:

DependencyPurpose
functoolsUsed for reducing a list of lists into a single list in unravel_ambiguous_states.
importlibUsed for importing modules in try_import.
numpyUsed for array operations in find_duplicate_groups.
pandasUsed for data manipulation in find_duplicate_groups.