> ## Documentation Index
> Fetch the complete documentation index at: https://demo.agenticlabs.com/llms.txt
> Use this file to discover all available pages before exploring further.

# GreedySolver.py

## High-level description

The `GreedySolver` class in Cassiopeia is an abstract base class for implementing top-down greedy algorithms for phylogenetic tree reconstruction. It provides a framework for recursively splitting a set of samples based on a split criterion, building the tree from the root down to the leaves. Subclasses of `GreedySolver` implement the specific logic for performing the split.

## Code Structure

The `GreedySolver` class defines the overall structure of a greedy solver, including methods for performing splits, computing mutation frequencies, and adding duplicate samples to the tree. The `perform_split` method is an abstract method that must be implemented by subclasses to define the specific split criterion.

## References

* `CassiopeiaSolver`: The `GreedySolver` class inherits from the `CassiopeiaSolver` abstract base class.
* `solver_utilities`: The `GreedySolver` class uses utility functions from the `solver_utilities` module, such as `node_name_generator` and `transform_priors`.
* `missing_data_methods`: The `GreedySolver` class uses functions from the `missing_data_methods` module to handle missing data during character splits.
* `find_duplicate_groups`: The `GreedySolver` class uses the `find_duplicate_groups` function from the `cassiopeia.mixins` module to identify duplicate samples.

## Symbols

### `GreedySolver`

#### Description

This class represents a greedy solver for phylogenetic tree reconstruction. It provides a framework for recursively splitting a set of samples based on a split criterion, building the tree from the root down to the leaves.

#### Inputs

| Name                  | Type | Description                                                                         |
| :-------------------- | :--- | :---------------------------------------------------------------------------------- |
| prior\_transformation | str  | Function to use when transforming priors into weights. Defaults to "negative\_log". |

#### Outputs

None

#### Internal Logic

1. Transform the priors into weights using the specified `prior_transformation`.
2. Recursively split the set of samples using the `perform_split` method:
   a. Create an ancestral node for each split.
   b. Place the resulting partitions as daughter clades.
3. Continue splitting until each partition contains only a single sample.
4. If a split cannot be performed, create a polytomy.
5. Add duplicate samples to the tree.
6. Collapse mutationless edges if specified.

### `perform_split`

#### Description

This abstract method defines the logic for partitioning a set of samples. It must be implemented by subclasses of `GreedySolver`.

#### Inputs

| Name                      | Type                                     | Description                                                                          |
| :------------------------ | :--------------------------------------- | :----------------------------------------------------------------------------------- |
| character\_matrix         | pd.DataFrame                             | Character matrix                                                                     |
| samples                   | List\[int]                               | A list of samples to partition                                                       |
| weights                   | Optional\[Dict\[int, Dict\[int, float]]] | Weighting of each (character, state) pair. Typically a transformation of the priors. |
| missing\_state\_indicator | int                                      | Character representing missing data.                                                 |

#### Outputs

| Name             | Type                    | Description               |
| :--------------- | :---------------------- | :------------------------ |
| left\_partition  | List\[Union\[int, str]] | The left partition group  |
| right\_partition | List\[Union\[int, str]] | The right partition group |

### `compute_mutation_frequencies`

#### Description

This method computes the frequencies of each character/state pair in a given set of samples.

#### Inputs

| Name                      | Type         | Description                          |
| :------------------------ | :----------- | :----------------------------------- |
| samples                   | List\[str]   | The set of samples to consider       |
| unique\_character\_matrix | pd.DataFrame | The character matrix                 |
| missing\_state\_indicator | int          | Character representing missing data. |

#### Outputs

| Name       | Type                        | Description                                                                         |
| :--------- | :-------------------------- | :---------------------------------------------------------------------------------- |
| freq\_dict | Dict\[int, Dict\[int, int]] | A dictionary mapping each character to a dictionary of state/sample frequency pairs |

#### Internal Logic

1. Iterate over each character in the character matrix.
2. For each character, count the occurrences of each state for the given samples.
3. Store the frequencies in a dictionary:
   * Keys are characters
   * Values are dictionaries mapping states to their frequencies

### `__add_duplicates_to_tree`

#### Description

This private method adds duplicate samples to the tree.

#### Inputs

| Name                  | Type                        | Description                              |
| :-------------------- | :-------------------------- | :--------------------------------------- |
| tree                  | nx.DiGraph                  | The tree to add duplicates to            |
| character\_matrix     | pd.DataFrame                | Character matrix                         |
| node\_name\_generator | Generator\[str, None, None] | Generator for creating unique node names |

#### Outputs

| Name | Type       | Description                    |
| :--- | :--------- | :----------------------------- |
| tree | nx.DiGraph | The tree with duplicates added |

#### Internal Logic

1. Identify duplicate samples using the `find_duplicate_groups` function.
2. For each group of duplicates:
   a. Create a new internal node.
   b. Connect the new node to the original sample.
   c. Connect the new node to all duplicates of the original sample.

## Side Effects

The `solve` method modifies the input `CassiopeiaTree` object by populating its tree attribute.

## Dependencies

* `networkx`: Used for representing and manipulating the tree.
* `numpy`: Used for numerical operations.
* `pandas`: Used for representing the character matrix.
