Flaky unit test(s): time window
jayaddison opened this issue ยท 20 comments
Hi @alisaifee!
I've been looking at limits
in relation to Python 3.11 upgrades, and noticed a flaky unit test while running the unit tests on a fairly old computer.
It looks like the problem occurred when the setup time for the first round of simulated requests took longer than 0.1s.
When that happens, the limit-cleared assertions can fail (for example, if we are at time start + 1.05s
-- while the 1s rate limit window remains in effect).
As a workaround, recording the time of the most recent simulated request, and waiting until 1s after that resolved the problem. (I do recognize that that subtly changes the meaning of the test, though)
last = None
while time.time() - start < 0.5 and count < 10:
assert await limiter.hit(per_second)
last = time.time() - start # keep track of the time of the most recent request
count += 1
assert not await limiter.hit(per_second)
while time.time() - start <= last + 1: # wait until one second after the most recent request
await asyncio.sleep(0.1)
for _ in range(10):
assert await limiter.hit(per_second)
Possibly not worth the effort to resolve given that it's probably fairly rare (so feel free to close this issue) - I thought it might be worth reporting though.
Thanks @jayaddison. Most of the window related tests are in fact painfully fragile
@jayaddison I've done a round of refactoring on master which in fact removes the flaky test case that you encountered altogether. There are still some flaky tests that I'm trying to track down but if you have a moment I'd appreciate if you could run some tests in the environment where you encountered the issue.
Thanks @alisaifee - yep, glad to help test those out - I should be able to share some results within the next day or so.
Oddly I seem to remember that it was one particular test only where the failure appeared - not two similar-duplicates, as might be expected. I had a sense that async-related setup (perhaps an imported infrastructure module) was introducing some of the time delay. I'll try to confirm that during the same update.
Thank you!
When running the tests, you can isolate them by using pytest markers such as:
pytest -m 'asyncio and redis'
pytest -m 'not asyncio and redis'
- etc... where storages can be one of
{redis, redis_sentinel, redis_cluster, memcached, mongodb, etcd}
You can also skip the integration tests which might not be worth your while by adding: and not integration
Ok, the results from testing v3.0.0 and v3.1.0 show that:
test_fixed_window
now passes formongodb
(bothaio
and non-aio
)test_fixed_window
now fails foretcd
withaio
(in other words: there is again a difference between aio
and non-aio
, although this time the failure appears for etcd
and not mongodb
)
Thanks @jayaddison. I too am seeing intermittent failures with etcd (to be fair the implementation for etcd is just a couple of days old). Weirdly, I see it more often with non-aio.
Could you share the exact error you receive and also just out of curiosity: what version of python are you using?
Also, out of curiosity - what are you actually upgrading that requires you to run the limits tests?
Could you share the exact error you receive and also just out of curiosity: what version of python are you using?
Yep, I'll respond with that early next week, I expect. That was using Python 3.10 (I'll check the precise revision soon).
what are you actually upgrading that requires you to run the limits tests?
I've been following along with some of the Debian project's migration to (potentially) switch to Python 3.11 as default for one of their upcoming stable releases, and... it's not a requirement, but I'd like to perform some testing (and provide feedback) alongside that process to help understand the status of various packages and to use what experience I have to help.
I see it more often with non-aio
Hmm, OK - odd. I'll try some of your suggestions about isolating to specific test markers to see if that helps figure out the cause on this side (and perhaps that'll similarly explain the difference on your side).
I've been following along with some of the Debian project's migration to (potentially) switch to Python 3.11 as default for one of their upcoming stable releases, and... it's not a requirement, but I'd like to perform some testing (and provide feedback) alongside that process to help understand the status of various packages and to use what experience I have to help.
I had a feeling it was related to debian packaging as I'm aware that this project is available as deb package (though I have no idea why
Could you share the exact error you receive
Ok: this might not a lot of information to work from, but the output (from tests running on version 3.0.0) is:
tests/aio/storage/test_mongodb.py:46: in test_fixed_window
assert await limiter.hit(per_second)
E assert False
(some empty space abbreviated)
@jayaddison as of 3.1.0
that test does not exist.
as of 3.1.0 that test does not exist.
Ok, yep - although it does seem that the test continues to fail when restored alongside 3.1.5
.
I'd be OK with closing this issue (since it was about that test failing), but it's probably worthwhile to determine the cause if we can?
@jayaddison yeah I think we can close this now. Nothing has really changed in the code under test and the failure was (afaik) caused by the fragility of the test setup. I do still see some intermittent failures in CI but they are significantly less frequent than before.
If-and-when restoring test_fixed_window
, an alternative pattern that seems to pass reliably is:
@pytest.mark.asyncio
async def test_fixed_window(self):
limiter = FixedWindowRateLimiter(self.storage)
per_second = RateLimitItemPerSecond(10)
end = time.time() + 1
# Check that no more than ten requests are accepted within the time window
assert all([await limiter.hit(per_second) for _ in range(10)])
assert not await limiter.hit(per_second)
assert time.time() <= end
# Check that the next-accepted-request occurs after the time window has elapsed
while not await limiter.hit(per_second):
await asyncio.sleep(0.1)
assert time.time() > end
The refactoring / cleanup showed a duplication of efforts in the tests where the storage tests were re-testing the strategies when that was already being done in the strategy tests. The fixed window strategy test here is less fragile since it uses the window
context manager to manage the bounds.
Sorry I linked the sync version of the strategy test, it should have been this one.
Nice! I didn't think to look at the refactored tests. Looks like we developed those with some clear similarities :)
A part of the code that I find a bit confusing / unclear: what are the array index accesses ([1]
and [0]
) retrieving, and is there a more Pythonic way to express that?
I'm not sure why I chose to return a tuple instead of an object from get_window_stats (this is probably 10 year old code) but the return is (reset time, remaining)
.
Now that you've brought it up, I think I can clean it up to use a named tuple while maintaining backward compatibility!
Nice :)
(I think I've taken up enough of your time asking questions about unit tests now, so I'll continue code adventures elsewhere. thanks for responding to my questions and for the refactorings!)
Not at all... the code is (hopefully) better and easier to understand due to the questions!