rank_sequences.py
High-level description
This script performs fitness inference on sequences in a multiple sequence alignment (MSA) to rank them based on their inferred fitness. It uses a phylogenetic approach, reconstructing the ancestral relationships between sequences and estimating the fitness of each sequence based on its position in the tree.
The script takes an alignment file in FASTA format and the name of an outgroup sequence as input. It then infers the phylogenetic tree, estimates fitness values for each sequence (including ancestral sequences), and outputs the tree, ancestral sequences, and sequence rankings to files.
Code Structure
The code first parses command-line arguments, then reads the input alignment and identifies the outgroup sequence. It then sets up a sequence_ranking
object from the sequence_ranking.py
module (see References) and uses it to perform the fitness inference. Finally, it outputs the results to files and optionally plots the tree.
References
This script references the following code symbols:
sequence_ranking
(fromsequence_ranking.py
): This class handles the core logic of sequence ranking based on phylogenetic analysis.
Symbols
ofunc
Description
This function determines the appropriate file opening method based on the file extension. If the filename ends with ‘.gz’, it assumes a gzipped file and uses gzip.open
. Otherwise, it uses the standard open
function.
Inputs
Name | Type | Description |
---|---|---|
fname | str | The name of the file to open. |
mode | str | The mode in which to open the file (e.g., ‘r’ for reading, ‘w’ for writing). |
Outputs
Name | Type | Description |
---|---|---|
file object | file | A file object representing the opened file. |
main execution block
Description
This block of code executes the main logic of the script. It reads the alignment file, identifies the outgroup sequence, sets up the sequence data, performs the fitness inference, and outputs the results.
Internal Logic
- Read alignment and outgroup: Reads the input alignment file in FASTA format and identifies the outgroup sequence.
- Set up sequence data: Creates a
sequence_ranking
object from thealignment
module, providing it with the alignment and outgroup information. - Perform fitness inference: Uses the
sequence_ranking
object to perform the fitness inference, including tree reconstruction and fitness estimation. - Output results: Writes the reconstructed tree, inferred ancestral sequences, and sequence rankings to files.
- Optional plotting: If the
--plot
flag is set, plots the tree with fitness information.
Dependencies
Dependency | Purpose |
---|---|
argparse | Parsing command-line arguments. |
matplotlib | Plotting the phylogenetic tree (optional). |
Bio | Handling biological sequence data and phylogenetic trees. |
tree_utils | Utility functions for working with phylogenetic trees. |
sequence_ranking | Custom module containing the sequence_ranking class for fitness inference. |
Configuration
This script uses command-line arguments for configuration. See the “parse the command line arguments” section for details on each argument.
Error Handling
The script checks if the specified outgroup sequence is present in the alignment. If not, it prints an error message and exits.
Logging
The script prints the time scale (tau
) used for local tree length estimation.