High-level description

This script performs fitness inference on sequences in a multiple sequence alignment (MSA) to rank them based on their inferred fitness. It uses a phylogenetic approach, reconstructing the ancestral relationships between sequences and estimating the fitness of each sequence based on its position in the tree.

The script takes an alignment file in FASTA format and the name of an outgroup sequence as input. It then infers the phylogenetic tree, estimates fitness values for each sequence (including ancestral sequences), and outputs the tree, ancestral sequences, and sequence rankings to files.

Code Structure

The code first parses command-line arguments, then reads the input alignment and identifies the outgroup sequence. It then sets up a sequence_ranking object from the sequence_ranking.py module (see References) and uses it to perform the fitness inference. Finally, it outputs the results to files and optionally plots the tree.

References

This script references the following code symbols:

  • sequence_ranking (from sequence_ranking.py): This class handles the core logic of sequence ranking based on phylogenetic analysis.

Symbols

ofunc

Description

This function determines the appropriate file opening method based on the file extension. If the filename ends with ‘.gz’, it assumes a gzipped file and uses gzip.open. Otherwise, it uses the standard open function.

Inputs

NameTypeDescription
fnamestrThe name of the file to open.
modestrThe mode in which to open the file (e.g., ‘r’ for reading, ‘w’ for writing).

Outputs

NameTypeDescription
file objectfileA file object representing the opened file.

main execution block

Description

This block of code executes the main logic of the script. It reads the alignment file, identifies the outgroup sequence, sets up the sequence data, performs the fitness inference, and outputs the results.

Internal Logic

  1. Read alignment and outgroup: Reads the input alignment file in FASTA format and identifies the outgroup sequence.
  2. Set up sequence data: Creates a sequence_ranking object from the alignment module, providing it with the alignment and outgroup information.
  3. Perform fitness inference: Uses the sequence_ranking object to perform the fitness inference, including tree reconstruction and fitness estimation.
  4. Output results: Writes the reconstructed tree, inferred ancestral sequences, and sequence rankings to files.
  5. Optional plotting: If the --plot flag is set, plots the tree with fitness information.

Dependencies

DependencyPurpose
argparseParsing command-line arguments.
matplotlibPlotting the phylogenetic tree (optional).
BioHandling biological sequence data and phylogenetic trees.
tree_utilsUtility functions for working with phylogenetic trees.
sequence_rankingCustom module containing the sequence_ranking class for fitness inference.

Configuration

This script uses command-line arguments for configuration. See the “parse the command line arguments” section for details on each argument.

Error Handling

The script checks if the specified outgroup sequence is present in the alignment. If not, it prints an error message and exits.

Logging

The script prints the time scale (tau) used for local tree length estimation.