bazingagin/npc_gzip

Test occasionally fails using top_k of 2 with just 1 sample

Closed this issue · 0 comments

I've noticed that test_predict in test_knn_classifier.py occasionally fails. I've seen this happen a few times. It does not usually happen. One example is this test run (which is a CI run on #43, but this is in no way specific to #43, which only changes metadata in pyproject.toml). Re-running the tests passes.

It looks like the problem has to do with how the test uses random numbers. Here's the most relevant code from the test:

test_set_size = random.randint(1, 50)
test_set = [self.sample_input for _ in range(test_set_size)]
top_k = 2
(distance, labels, similar_samples) = self.model.predict(test_set, top_k)
assert distance.shape == (test_set_size, self.model.training_inputs.shape[0])
assert labels.shape == (test_set_size,)
assert similar_samples.shape == (test_set_size, top_k)

Note that test_set_size is chosen randomly and can be a small as 1, but the test uses a top_k of 2. This seems to be the only problem, and I've proposed a fix in #46.

For convenience, when the test fails, it shows:

>       assert (
            top_k <= x.shape[0]
        ), f"""
        top_k ({top_k}) must be less or equal to than the number of
        samples provided to be predicted on ({x.shape[0]})
    
        """
E       AssertionError: 
E               top_k (2) must be less or equal to than the number of
E               samples provided to be predicted on (1)

npc_gzip/knn_classifier.py:309: AssertionError
----------------------------- Captured stderr call -----------------------------

Compressing input...:   0%|          | 0/1 [00:00<?, ?it/s]
Compressing input...: 100%|██████████| 1/1 [00:00<00:00, [121](https://github.com/bazingagin/npc_gzip/actions/runs/5753719296/job/15597516762?pr=43#step:7:122).54it/s]
- generated xml file: /Users/runner/work/npc_gzip/npc_gzip/junit/test-results-macos-3.9.xml -
=========================== short test summary info ============================
FAILED tests/test_knn_classifier.py::TestKnnClassifier::test_predict - AssertionError: 
        top_k (2) must be less or equal to than the number of
        samples provided to be predicted on (1)
========================= 1 failed, 45 passed in 9.95s =========================

I have only shown the end of the output. Full output can be seen in the failing test run. The code that pytest includes in the output is from the npc_gzip.knn_classifier.KnnClassifier.predict method.