ILPSolver.py
High-level description
The ILPSolver
class implements the Cassiopeia-ILP algorithm for inferring a maximum parsimony phylogenetic tree. It constructs a potential graph of possible evolutionary states and uses Gurobi, an ILP solver, to find the maximum parsimony Steiner Tree within this graph. This tree represents the proposed phylogenetic relationship between the input samples.
Code Structure
The ILPSolver
class inherits from the CassiopeiaSolver
abstract class. It overrides the solve
method to implement the Cassiopeia-ILP algorithm. The main steps involve inferring a potential graph, generating a Steiner Tree ILP model, solving the ILP problem, post-processing the solution, and populating the resulting tree in a CassiopeiaTree
object.
References
ilp_solver_utilities
: This module provides utility functions for inferring the potential graph and other ILP-related operations.dissimilarity_functions
: This module provides functions for calculating dissimilarity between character states, including weighted Hamming distance.solver_utilities
: This module provides general utility functions for tree solvers, such as transforming priors into weights.gurobipy
: This module provides the interface to the Gurobi optimizer.
Symbols
ILPSolver
Description
This class implements the Cassiopeia-ILP algorithm for phylogenetic tree inference.
Inputs
Name | Type | Description |
---|---|---|
convergence_time_limit | int | Amount of time allotted to the ILP for convergence (default: 12600 seconds). |
convergence_iteration_limit | int | Number of iterations allowed for ILP convergence (default: 0, meaning no limit). |
maximum_potential_graph_layer_size | int | Maximum size allowed for an iteration of the potential graph inference procedure (default: 10000). |
maximum_potential_graph_lca_distance | Optional[int] | Maximum height of LCA to add to the potential graph (default: None, meaning the maximum distance between any pair of samples is used). |
weighted | bool | Whether to weight edges on the potential graph by the negative log likelihood of the mutations (default: False). |
seed | Optional[int] | Random seed to use during ILP optimization (default: None). |
mip_gap | float | Objective gap for mixed integer linear programming problem (default: 0.01). |
prior_transformation | str | Function to use when transforming priors into weights (default: “negative_log”). |
Outputs
The solve
method does not return any value but populates the provided CassiopeiaTree
object with the inferred tree.
Internal Logic
The solve
method performs the following steps:
- Preprocessing: Checks for ambiguous states, prepares the character matrix, and transforms priors into weights if necessary.
- Root Identification: Identifies the root of the tree based on the least common ancestor of all samples.
- Potential Graph Inference: Infers the potential graph using the
infer_potential_graph
method. - Steiner Tree Model Generation: Generates a Gurobi ILP model for the Steiner Tree problem using the
generate_steiner_model
method. - ILP Solution: Solves the ILP problem using the
solve_steiner_instance
method. - Solution Post-processing: Post-processes the solution to remove self-loops and enforce a single parent for each node using the
post_process_steiner_solution
method. - Tree Population: Appends sample names to the solution, removes spurious leaves, and populates the
CassiopeiaTree
object with the resulting tree.
infer_potential_graph
Description
This method infers the potential graph, a network of possible evolutionary intermediates, from the character matrix.
Inputs
Name | Type | Description |
---|---|---|
character_matrix | pd.DataFrame | Character matrix of mutation observations. |
pid | int | Process ID for future reference. |
lca_height | int | Maximum LCA height to consider for connecting nodes to an LCA. |
weights | Optional[Dict[int, Dict[int, str]]] | Weights for character-state pairs, derived from the priors if available. |
missing_state_indicator | int | Indicator for missing data (default: -1). |
Outputs
Name | Type | Description |
---|---|---|
potential_graph | nx.DiGraph | A potential graph represented by a directed graph. |
Internal Logic
- Edge Inference: Calls the
ilp_solver_utilities.infer_potential_graph_cython
function to obtain the edges of the potential graph in character string format. - Edge Decoding: Decodes the character strings into tuples representing character states.
- Graph Construction: Creates a Networkx directed graph and adds the decoded edges.
- Edge Weighting: Adds edge weights based on the provided weights or the number of mutations along each edge using the
add_edge_weights
method.
add_edge_weights
Description
This method annotates edges in the potential graph with weights.
Inputs
Name | Type | Description |
---|---|---|
potential_graph | nx.DiGraph | Potential graph to annotate with edge weights. |
weights | Optional[Dict[int, Dict[int, str]]] | Weights for character-state pairs. |
missing_state_indicator | int | Indicator for missing data (default: -1). |
Outputs
Name | Type | Description |
---|---|---|
weighted_graph | nx.DiGraph | The potential graph with edge weights added, stored in the weight attribute. |
Internal Logic
Iterates through the edges of the potential graph and calculates the weight for each edge using either the provided weights or the weighted Hamming distance between the character states of the connected nodes.
generate_steiner_model
Description
This method generates a Gurobi ILP model for the Steiner Tree problem.
Inputs
Name | Type | Description |
---|---|---|
potential_graph | nx.DiGraph | Potential graph representing the evolutionary space. |
root | List[int] | A node in the graph to treat as the source. |
targets | List[List[int]] | A list of nodes in the tree that serve as targets. |
Outputs
Name | Type | Description |
---|---|---|
model | gurobipy.Model | A Gurobipy Model instance. |
edge_variables | Dict[Tuple[int, int], gurobipy.Var] | Edge variables used in the model. |
Internal Logic
- Model Initialization: Creates a Gurobi model instance.
- Variable Addition: Adds flow variables for each edge and binary variables indicating edge usage.
- Constraint Addition: Adds constraints for flow conservation and edge usage.
- Objective Definition: Sets the objective function to minimize the total weight of the used edges.
solve_steiner_instance
Description
This method solves the Steiner Tree problem using the generated Gurobi model.
Inputs
Name | Type | Description |
---|---|---|
model | gurobipy.Model | A Gurobi model instance. |
edge_variables | Dict[Tuple[int, int], gurobipy.Var] | Edge variables used in the model. |
potential_graph | nx.DiGraph | Potential graph used as input to the Steiner Tree problem. |
pid | int | Process ID. |
logfile | str | Location to store standard out. |
Outputs
Name | Type | Description |
---|---|---|
solutions | List[nx.DiGraph] | A list of solutions, each represented as a directed graph. |
Internal Logic
- Parameter Configuration: Sets Gurobi parameters, including convergence criteria, MIP gap, and logging options.
- Model Optimization: Calls the
model.optimize()
method to solve the ILP problem. - Solution Recovery: Extracts the solutions from the Gurobi model and converts them into Networkx directed graphs.
post_process_steiner_solution
Description
This method post-processes the Steiner Tree solution to remove self-loops and enforce a single parent for each node.
Inputs
Name | Type | Description |
---|---|---|
solution | nx.DiGraph | The Gurobi solution. |
root | List[int] | The root node. |
Outputs
Name | Type | Description |
---|---|---|
processed_solution | nx.DiGraph | A cleaned up Networkx solution. |
Internal Logic
- Self-loop Removal: Removes any self-loops from the solution.
- Spurious Root Removal: Removes any spurious roots, ensuring only one root remains.
- Single Parent Enforcement: For nodes with multiple parents, selects the parent with the highest edge weight and removes edges to other parents.
__append_sample_names_and_remove_spurious_leaves
Description
This private method appends sample names to the character states in the tree and prunes spurious leaves.
Inputs
Name | Type | Description |
---|---|---|
solution | nx.DiGraph | A Steiner Tree solution. |
character_matrix | pd.DataFrame | Character matrix of mutation observations. |
Outputs
Name | Type | Description |
---|---|---|
solution | nx.DiGraph | A solution with extra leaves corresponding to sample names. |
Internal Logic
- Sample Appending: Appends sample names to the deepest nodes in the tree that correspond to their character states.
- Spurious Leaf Removal: Removes any extant nodes that do not have samples appended to them and prunes their lineages.
Side Effects
The solve
method modifies the provided CassiopeiaTree
object by populating it with the inferred tree.
Dependencies
Dependency | Purpose |
---|---|
gurobipy | Used for solving the ILP problem. |
networkx | Used for representing the potential graph and the inferred tree. |
pandas | Used for handling the character matrix. |
numpy | Used for numerical operations. |
Error Handling
The ILPSolver
class raises ILPSolverError
exceptions in the following cases:
- If the input character matrix contains ambiguous states.
- If the root node is not found in the potential graph.
- If any target node is not found in the potential graph.
- If Gurobi is not installed.
- If the potential graph cannot be inferred with the specified parameters.
Logging
The solve
method logs progress and information about the potential graph inference and Steiner Tree optimization to the specified log file.