Flaky unit test(s): time window

Question

Flaky unit test(s): time window

jayaddison opened this issue 2 years ago · 20 comments

I've been looking at limits in relation to Python 3.11 upgrades, and noticed a flaky unit test while running the unit tests on a fairly old computer.

It looks like the problem occurred when the setup time for the first round of simulated requests took longer than 0.1s.

When that happens, the limit-cleared assertions can fail (for example, if we are at time start + 1.05s -- while the 1s rate limit window remains in effect).

As a workaround, recording the time of the most recent simulated request, and waiting until 1s after that resolved the problem. (I do recognize that that subtly changes the meaning of the test, though)

        last = None
        while time.time() - start < 0.5 and count < 10:
            assert await limiter.hit(per_second)
            last = time.time() - start  # keep track of the time of the most recent request
            count += 1
        assert not await limiter.hit(per_second)

        while time.time() - start <= last + 1:  # wait until one second after the most recent request
            await asyncio.sleep(0.1)
        for _ in range(10):
            assert await limiter.hit(per_second)

Possibly not worth the effort to resolve given that it's probably fairly rare (so feel free to close this issue) - I thought it might be worth reporting though.

Answer 1 · 2022-12-29T15:32:12.000Z

Thanks @jayaddison. Most of the window related tests are in fact painfully fragile 😓 Your recommendation makes sense at first glance, and I'll leave this issue open as a reminder to revisit all the tests and see if they can be cleaned up (the same issue probably exists for other storages).

Answer 2 · 2023-01-06T01:58:12.000Z

@jayaddison I've done a round of refactoring on master which in fact removes the flaky test case that you encountered altogether. There are still some flaky tests that I'm trying to track down but if you have a moment I'd appreciate if you could run some tests in the environment where you encountered the issue.

Answer 3 · 2023-01-06T03:31:46.000Z

Thanks @alisaifee - yep, glad to help test those out - I should be able to share some results within the next day or so.

Oddly I seem to remember that it was one particular test only where the failure appeared - not two similar-duplicates, as might be expected. I had a sense that async-related setup (perhaps an imported infrastructure module) was introducing some of the time delay. I'll try to confirm that during the same update.

Answer 4 · 2023-01-06T03:44:47.000Z

Thank you!

When running the tests, you can isolate them by using pytest markers such as:

pytest -m 'asyncio and redis'
pytest -m 'not asyncio and redis'
etc... where storages can be one of {redis, redis_sentinel, redis_cluster, memcached, mongodb, etcd}

You can also skip the integration tests which might not be worth your while by adding: and not integration

Answer 5 · 2023-01-06T12:30:28.000Z

Ok, the results from testing v3.0.0 and v3.1.0 show that:

test_fixed_window now passes for mongodb (both aio and non-aio)
test_fixed_window now fails for etcd with aio

(in other words: there is again a difference between aio and non-aio, although this time the failure appears for etcd and not mongodb)

Answer 6 · 2023-01-06T17:01:07.000Z

Thanks @jayaddison. I too am seeing intermittent failures with etcd (to be fair the implementation for etcd is just a couple of days old). Weirdly, I see it more often with non-aio.

Could you share the exact error you receive and also just out of curiosity: what version of python are you using?

Answer 7 · 2023-01-06T17:01:58.000Z

Also, out of curiosity - what are you actually upgrading that requires you to run the limits tests?

Answer 8 · 2023-01-07T00:35:12.000Z

Could you share the exact error you receive and also just out of curiosity: what version of python are you using?

Yep, I'll respond with that early next week, I expect. That was using Python 3.10 (I'll check the precise revision soon).

what are you actually upgrading that requires you to run the limits tests?

I've been following along with some of the Debian project's migration to (potentially) switch to Python 3.11 as default for one of their upcoming stable releases, and... it's not a requirement, but I'd like to perform some testing (and provide feedback) alongside that process to help understand the status of various packages and to use what experience I have to help.

I see it more often with non-aio

Hmm, OK - odd. I'll try some of your suggestions about isolating to specific test markers to see if that helps figure out the cause on this side (and perhaps that'll similarly explain the difference on your side).

Answer 9 · 2023-01-07T01:46:21.000Z

I've been following along with some of the Debian project's migration to (potentially) switch to Python 3.11 as default for one of their upcoming stable releases, and... it's not a requirement, but I'd like to perform some testing (and provide feedback) alongside that process to help understand the status of various packages and to use what experience I have to help.

I had a feeling it was related to debian packaging as I'm aware that this project is available as deb package (though I have no idea why 😁). Happy to help in any way!

Answer 10 · 2023-01-10T23:05:16.000Z

Could you share the exact error you receive

Ok: this might not a lot of information to work from, but the output (from tests running on version 3.0.0) is:

tests/aio/storage/test_mongodb.py:46: in test_fixed_window
    assert await limiter.hit(per_second)
E   assert False

(some empty space abbreviated)

Answer 11 · 2023-01-11T19:08:33.000Z

@jayaddison as of 3.1.0 that test does not exist.

Answer 12 · 2023-01-12T16:28:36.000Z

as of 3.1.0 that test does not exist.

Ok, yep - although it does seem that the test continues to fail when restored alongside 3.1.5.

I'd be OK with closing this issue (since it was about that test failing), but it's probably worthwhile to determine the cause if we can?

Answer 13 · 2023-01-12T21:23:25.000Z

@jayaddison yeah I think we can close this now. Nothing has really changed in the code under test and the failure was (afaik) caused by the fragility of the test setup. I do still see some intermittent failures in CI but they are significantly less frequent than before.

Answer 14 · 2023-01-13T14:11:07.000Z

If-and-when restoring test_fixed_window, an alternative pattern that seems to pass reliably is:

@pytest.mark.asyncio
async def test_fixed_window(self):
    limiter = FixedWindowRateLimiter(self.storage)
    per_second = RateLimitItemPerSecond(10)
    end = time.time() + 1

    # Check that no more than ten requests are accepted within the time window
    assert all([await limiter.hit(per_second) for _ in range(10)])
    assert not await limiter.hit(per_second)
    assert time.time() <= end

    # Check that the next-accepted-request occurs after the time window has elapsed
    while not await limiter.hit(per_second):
        await asyncio.sleep(0.1)
    assert time.time() > end

Answer 15 · 2023-01-13T18:33:35.000Z

The refactoring / cleanup showed a duplication of efforts in the tests where the storage tests were re-testing the strategies when that was already being done in the strategy tests. The fixed window strategy test here is less fragile since it uses the window context manager to manage the bounds.

Answer 16 · 2023-01-13T18:49:43.000Z

Sorry I linked the sync version of the strategy test, it should have been this one.

Answer 17 · 2023-01-13T19:05:22.000Z

Nice! I didn't think to look at the refactored tests. Looks like we developed those with some clear similarities :)

A part of the code that I find a bit confusing / unclear: what are the array index accesses ([1] and [0]) retrieving, and is there a more Pythonic way to express that?

Answer 18 · 2023-01-13T19:25:00.000Z

I'm not sure why I chose to return a tuple instead of an object from get_window_stats (this is probably 10 year old code) but the return is (reset time, remaining).

Now that you've brought it up, I think I can clean it up to use a named tuple while maintaining backward compatibility!

Answer 19 · 2023-01-13T19:35:52.000Z

Nice :)

(I think I've taken up enough of your time asking questions about unit tests now, so I'll continue code adventures elsewhere. thanks for responding to my questions and for the refactorings!)

Answer 20 · 2023-01-13T19:36:47.000Z

Not at all... the code is (hopefully) better and easier to understand due to the questions!