Here’s a high-level description and documentation of the provided code:

High-level description

This file contains unit tests for the data utilities stored in cassiopeia/data/utilities.py. It tests various functions related to character matrix operations, bootstrap sampling, dissimilarity map computations, and other utility functions used in the Cassiopeia lineage tracing framework.

Code Structure

The code is structured as a single test class TestDataUtilities that inherits from unittest.TestCase. This class contains multiple test methods, each testing a specific functionality of the data utilities module.

Symbols

TestDataUtilities

Description

A test class containing various unit tests for the data utilities module.

Internal Logic

  1. Sets up test data in the setUp method, including character matrices, priors, and allele tables.
  2. Defines multiple test methods, each focusing on a specific utility function or feature.

test_bootstrap_character_matrices_no_priors

Description

Tests the bootstrap sampling of character matrices without priors.

test_bootstrap_character_matrices_with_priors

Description

Tests the bootstrap sampling of character matrices with priors.

test_bootstrap_allele_tables

Description

Tests the bootstrap sampling of allele tables.

test_bootstrap_allele_tables_non_cassiopeia_allele_table

Description

Tests the bootstrap sampling of allele tables with non-standard column names.

test_bootstrap_allele_tables_priors

Description

Tests the bootstrap sampling of allele tables with priors.

test_to_newick_no_branch_lengths

Description

Tests the conversion of a tree to Newick format without branch lengths.

test_to_newick_branch_lengths

Description

Tests the conversion of a tree to Newick format with branch lengths.

test_lca_characters

Description

Tests the computation of least common ancestor (LCA) characters.

test_lca_characters_ambiguous

Description

Tests the computation of LCA characters with ambiguous states.

test_lca_characters_ambiguous2

Description

Tests another case of LCA character computation with ambiguous states.

test_lca_characters_ambiguous_and_missing

Description

Tests the computation of LCA characters with both ambiguous and missing states.

test_resolve_most_abundant

Description

Tests the resolution of ambiguous states by selecting the most abundant state.

test_simple_phylogenetic_weights_matrix

Description

Tests the computation of a simple phylogenetic weight matrix.

test_simple_phylogenetic_weights_matrix_inverse

Description

Tests the computation of an inverse simple phylogenetic weight matrix.

test_phylogenetic_weights_matrix_inverse_fn

Description

Tests the computation of a phylogenetic weight matrix with a custom inverse function.

test_net_relatedness_index

Description

Tests the computation of the net relatedness index.

test_inter_cluster_distance_basic

Description

Tests the computation of inter-cluster distances.

test_inter_cluster_distance_custom_input

Description

Tests the computation of inter-cluster distances with custom input.

Dependencies

  • unittest
  • networkx
  • numpy
  • pandas
  • cassiopeia.data
  • cassiopeia.mixins.errors
  • cassiopeia.preprocess.utilities

The test suite can be run by executing this file directly using Python.