High-level description
This file contains a set of utility functions for working with character matrices and trees in Cassiopeia. These functions include importing modules, checking for ambiguous states, unraveling ambiguous states, finding duplicate groups in a character matrix, and other helper functions.Code Structure
The code consists of a set of independent utility functions that can be used in various parts of the Cassiopeia codebase. Some functions are used by other functions, such asis_ambiguous_state being used by unravel_ambiguous_states and find_duplicate_groups.
References
This file references thengs_tools and functools libraries. It also references the is_ambiguous_state function from the cassiopeia.mixins module.
Symbols
is_ambiguous_state
Description
Determines if a given state is ambiguous. An ambiguous state is represented as a tuple of integers.Inputs
| Name | Type | Description |
|---|---|---|
| state | Union[int, Tuple[int, …]] | A single character state, which can be an integer or a tuple of integers. |
Outputs
| Name | Type | Description |
|---|---|---|
| bool | True if the state is ambiguous (i.e., a tuple), False otherwise. |
Internal Logic
The function simply checks if the inputstate is an instance of a tuple. If it is, it returns True, indicating an ambiguous state. Otherwise, it returns False.
try_import
Description
Attempts to import a module and returns it if successful, otherwise returns None. This is useful for handling optional dependencies.Inputs
| Name | Type | Description |
|---|---|---|
| module | str | The name of the module to import. |
Outputs
| Name | Type | Description |
|---|---|---|
| Optional[ModuleType] | The imported module if successful, otherwise None. |
Internal Logic
The function uses a try-except block to attempt importing the module usingimportlib.import_module. If a ModuleNotFoundError is raised, it returns None. Otherwise, it returns the imported module.
unravel_ambiguous_states
Description
Unravels a list of potentially ambiguous states into a list of unique states.Inputs
| Name | Type | Description |
|---|---|---|
| state_array | List[Union[int, Tuple[int, …]]] | A list of character states, potentially containing ambiguous states represented as tuples. |
Outputs
| Name | Type | Description |
|---|---|---|
| List[int] | A list of unique integer states present in the input state_array. |
Internal Logic
The function iterates through thestate_array. For each state, if it’s ambiguous (a tuple), it converts it to a list. Otherwise, it creates a list containing the single state. Finally, it uses functools.reduce to concatenate all the lists into a single list of unique states.
find_duplicate_groups
Description
Identifies groups of samples in a character matrix that have identical character states and maps them to a dictionary.Inputs
| Name | Type | Description |
|---|---|---|
| character_matrix | pd.DataFrame | A character matrix potentially containing ambiguous states. |
Outputs
| Name | Type | Description |
|---|---|---|
| Dict[str, Tuple[str, …]] | A dictionary mapping a single sample name to a tuple of sample names that share the same character states. |
Internal Logic
- Sets the index name of the character matrix to “index”.
- Creates a copy of the character matrix and converts each element to a set to handle ambiguous states.
- Identifies duplicated rows in the converted matrix using
pd.DataFrame.duplicated. - Finds unique duplicated states.
- Groups sample names based on unique duplicated states.
- Creates a dictionary mapping the first sample name in each group to a tuple of all sample names in that group.
Dependencies
This file depends on the following external libraries:| Dependency | Purpose |
|---|---|
| functools | Used for reducing a list of lists into a single list in unravel_ambiguous_states. |
| importlib | Used for importing modules in try_import. |
| numpy | Used for array operations in find_duplicate_groups. |
| pandas | Used for data manipulation in find_duplicate_groups. |
