Here’s a high-level description and documentation of the provided code:

High-level description

This file contains unit tests for the utilities stored in cassiopeia/data/utilities.py. It tests various functions related to bootstrapping character matrices and allele tables, converting trees to Newick format, computing phylogenetic weight matrices, and calculating inter-cluster distances.

Code Structure

The code defines a TestDataUtilities class that inherits from unittest.TestCase. This class contains multiple test methods, each testing a specific functionality of the data utilities module.

Symbols

TestDataUtilities

Description

A test class containing various unit tests for the data utilities module.

Internal Logic

The class sets up test data in the setUp method and defines multiple test methods to verify the functionality of different utility functions.

test_bootstrap_character_matrices_no_priors

Description

Tests the sample_bootstrap_character_matrices function without providing priors.

test_bootstrap_character_matrices_with_priors

Description

Tests the sample_bootstrap_character_matrices function with provided priors.

test_bootstrap_allele_tables

Description

Tests the sample_bootstrap_allele_tables function for standard allele tables.

test_bootstrap_allele_tables_non_cassiopeia_allele_table

Description

Tests the sample_bootstrap_allele_tables function for non-standard allele tables.

test_bootstrap_allele_tables_priors

Description

Tests the sample_bootstrap_allele_tables function with provided indel priors.

test_to_newick_no_branch_lengths

Description

Tests the to_newick function without including branch lengths.

test_to_newick_branch_lengths

Description

Tests the to_newick function including branch lengths.

test_lca_characters

Description

Tests the get_lca_characters function for finding the lowest common ancestor characters.

test_lca_characters_ambiguous

Description

Tests the get_lca_characters function with ambiguous character states.

test_lca_characters_ambiguous2

Description

Another test for the get_lca_characters function with different ambiguous character states.

test_lca_characters_ambiguous_and_missing

Description

Tests the get_lca_characters function with both ambiguous and missing character states.

test_resolve_most_abundant

Description

Tests the resolve_most_abundant function for resolving the most common state in an ambiguous character.

test_simple_phylogenetic_weights_matrix

Description

Tests the compute_phylogenetic_weight_matrix function for a simple tree.

test_simple_phylogenetic_weights_matrix_inverse

Description

Tests the compute_phylogenetic_weight_matrix function with inverse weights.

test_phylogenetic_weights_matrix_inverse_fn

Description

Tests the compute_phylogenetic_weight_matrix function with a custom inverse function.

test_net_relatedness_index

Description

Tests the net_relatedness_index function for calculating the Net Relatedness Index.

test_inter_cluster_distance_basic

Description

Tests the compute_inter_cluster_distances function for basic inter-cluster distance calculation.

test_inter_cluster_distance_custom_input

Description

Tests the compute_inter_cluster_distances function with custom input data.

Dependencies

  • unittest
  • networkx
  • numpy
  • pandas
  • cassiopeia.data.CassiopeiaTree
  • cassiopeia.data.utilities
  • cassiopeia.mixins.errors.CassiopeiaError
  • cassiopeia.preprocess.utilities

This test suite ensures that the data utility functions in the Cassiopeia package are working correctly and handling various edge cases appropriately.