High-level description

The ILPSolver class implements the Cassiopeia-ILP algorithm for inferring a maximum parsimony phylogenetic tree. It constructs a potential graph of possible evolutionary states and uses Gurobi, an ILP solver, to find the maximum parsimony Steiner Tree within this graph. This tree represents the proposed phylogenetic relationship between the input samples.

Code Structure

The ILPSolver class inherits from the CassiopeiaSolver abstract class. It overrides the solve method to implement the Cassiopeia-ILP algorithm. The main steps involve inferring a potential graph, generating a Steiner Tree ILP model, solving the ILP problem, post-processing the solution, and populating the resulting tree in a CassiopeiaTree object.

References

  • ilp_solver_utilities: This module provides utility functions for inferring the potential graph and other ILP-related operations.
  • dissimilarity_functions: This module provides functions for calculating dissimilarity between character states, including weighted Hamming distance.
  • solver_utilities: This module provides general utility functions for tree solvers, such as transforming priors into weights.
  • gurobipy: This module provides the interface to the Gurobi optimizer.

Symbols

ILPSolver

Description

This class implements the Cassiopeia-ILP algorithm for phylogenetic tree inference.

Inputs

NameTypeDescription
convergence_time_limitintAmount of time allotted to the ILP for convergence (default: 12600 seconds).
convergence_iteration_limitintNumber of iterations allowed for ILP convergence (default: 0, meaning no limit).
maximum_potential_graph_layer_sizeintMaximum size allowed for an iteration of the potential graph inference procedure (default: 10000).
maximum_potential_graph_lca_distanceOptional[int]Maximum height of LCA to add to the potential graph (default: None, meaning the maximum distance between any pair of samples is used).
weightedboolWhether to weight edges on the potential graph by the negative log likelihood of the mutations (default: False).
seedOptional[int]Random seed to use during ILP optimization (default: None).
mip_gapfloatObjective gap for mixed integer linear programming problem (default: 0.01).
prior_transformationstrFunction to use when transforming priors into weights (default: “negative_log”).

Outputs

The solve method does not return any value but populates the provided CassiopeiaTree object with the inferred tree.

Internal Logic

The solve method performs the following steps:

  1. Preprocessing: Checks for ambiguous states, prepares the character matrix, and transforms priors into weights if necessary.
  2. Root Identification: Identifies the root of the tree based on the least common ancestor of all samples.
  3. Potential Graph Inference: Infers the potential graph using the infer_potential_graph method.
  4. Steiner Tree Model Generation: Generates a Gurobi ILP model for the Steiner Tree problem using the generate_steiner_model method.
  5. ILP Solution: Solves the ILP problem using the solve_steiner_instance method.
  6. Solution Post-processing: Post-processes the solution to remove self-loops and enforce a single parent for each node using the post_process_steiner_solution method.
  7. Tree Population: Appends sample names to the solution, removes spurious leaves, and populates the CassiopeiaTree object with the resulting tree.

infer_potential_graph

Description

This method infers the potential graph, a network of possible evolutionary intermediates, from the character matrix.

Inputs

NameTypeDescription
character_matrixpd.DataFrameCharacter matrix of mutation observations.
pidintProcess ID for future reference.
lca_heightintMaximum LCA height to consider for connecting nodes to an LCA.
weightsOptional[Dict[int, Dict[int, str]]]Weights for character-state pairs, derived from the priors if available.
missing_state_indicatorintIndicator for missing data (default: -1).

Outputs

NameTypeDescription
potential_graphnx.DiGraphA potential graph represented by a directed graph.

Internal Logic

  1. Edge Inference: Calls the ilp_solver_utilities.infer_potential_graph_cython function to obtain the edges of the potential graph in character string format.
  2. Edge Decoding: Decodes the character strings into tuples representing character states.
  3. Graph Construction: Creates a Networkx directed graph and adds the decoded edges.
  4. Edge Weighting: Adds edge weights based on the provided weights or the number of mutations along each edge using the add_edge_weights method.

add_edge_weights

Description

This method annotates edges in the potential graph with weights.

Inputs

NameTypeDescription
potential_graphnx.DiGraphPotential graph to annotate with edge weights.
weightsOptional[Dict[int, Dict[int, str]]]Weights for character-state pairs.
missing_state_indicatorintIndicator for missing data (default: -1).

Outputs

NameTypeDescription
weighted_graphnx.DiGraphThe potential graph with edge weights added, stored in the weight attribute.

Internal Logic

Iterates through the edges of the potential graph and calculates the weight for each edge using either the provided weights or the weighted Hamming distance between the character states of the connected nodes.

generate_steiner_model

Description

This method generates a Gurobi ILP model for the Steiner Tree problem.

Inputs

NameTypeDescription
potential_graphnx.DiGraphPotential graph representing the evolutionary space.
rootList[int]A node in the graph to treat as the source.
targetsList[List[int]]A list of nodes in the tree that serve as targets.

Outputs

NameTypeDescription
modelgurobipy.ModelA Gurobipy Model instance.
edge_variablesDict[Tuple[int, int], gurobipy.Var]Edge variables used in the model.

Internal Logic

  1. Model Initialization: Creates a Gurobi model instance.
  2. Variable Addition: Adds flow variables for each edge and binary variables indicating edge usage.
  3. Constraint Addition: Adds constraints for flow conservation and edge usage.
  4. Objective Definition: Sets the objective function to minimize the total weight of the used edges.

solve_steiner_instance

Description

This method solves the Steiner Tree problem using the generated Gurobi model.

Inputs

NameTypeDescription
modelgurobipy.ModelA Gurobi model instance.
edge_variablesDict[Tuple[int, int], gurobipy.Var]Edge variables used in the model.
potential_graphnx.DiGraphPotential graph used as input to the Steiner Tree problem.
pidintProcess ID.
logfilestrLocation to store standard out.

Outputs

NameTypeDescription
solutionsList[nx.DiGraph]A list of solutions, each represented as a directed graph.

Internal Logic

  1. Parameter Configuration: Sets Gurobi parameters, including convergence criteria, MIP gap, and logging options.
  2. Model Optimization: Calls the model.optimize() method to solve the ILP problem.
  3. Solution Recovery: Extracts the solutions from the Gurobi model and converts them into Networkx directed graphs.

post_process_steiner_solution

Description

This method post-processes the Steiner Tree solution to remove self-loops and enforce a single parent for each node.

Inputs

NameTypeDescription
solutionnx.DiGraphThe Gurobi solution.
rootList[int]The root node.

Outputs

NameTypeDescription
processed_solutionnx.DiGraphA cleaned up Networkx solution.

Internal Logic

  1. Self-loop Removal: Removes any self-loops from the solution.
  2. Spurious Root Removal: Removes any spurious roots, ensuring only one root remains.
  3. Single Parent Enforcement: For nodes with multiple parents, selects the parent with the highest edge weight and removes edges to other parents.

__append_sample_names_and_remove_spurious_leaves

Description

This private method appends sample names to the character states in the tree and prunes spurious leaves.

Inputs

NameTypeDescription
solutionnx.DiGraphA Steiner Tree solution.
character_matrixpd.DataFrameCharacter matrix of mutation observations.

Outputs

NameTypeDescription
solutionnx.DiGraphA solution with extra leaves corresponding to sample names.

Internal Logic

  1. Sample Appending: Appends sample names to the deepest nodes in the tree that correspond to their character states.
  2. Spurious Leaf Removal: Removes any extant nodes that do not have samples appended to them and prunes their lineages.

Side Effects

The solve method modifies the provided CassiopeiaTree object by populating it with the inferred tree.

Dependencies

DependencyPurpose
gurobipyUsed for solving the ILP problem.
networkxUsed for representing the potential graph and the inferred tree.
pandasUsed for handling the character matrix.
numpyUsed for numerical operations.

Error Handling

The ILPSolver class raises ILPSolverError exceptions in the following cases:

  • If the input character matrix contains ambiguous states.
  • If the root node is not found in the potential graph.
  • If any target node is not found in the potential graph.
  • If Gurobi is not installed.
  • If the potential graph cannot be inferred with the specified parameters.

Logging

The solve method logs progress and information about the potential graph inference and Steiner Tree optimization to the specified log file.