Possible leakage using rstest or tokio::test

Question

Possible leakage using rstest or tokio::test

Opened this issue 8 months ago · 13 comments

Greetings,
Love this crate, thanks for this work. I've been using it and noticed odd errors with my tests failing then passing on re-runs of the tests. I'm wondering if there's some leakage of the mutex clock between tests due to these wrappers? Have you tested with either of these testing architectures?
Appreciate your thoughts!

Answer 1 · 2024-05-08T21:01:01.000Z

I have not tested it with rstest, but I can see how it could be problematic.

With tokio::test, one should be using tokio's own Instant wrapper to pause/resume time. Tokio has its own Duration/Instant/SystemTime types that integrates with its timer subsystem.

https://docs.rs/tokio/1.37.0/tokio/time/fn.advance.html
https://docs.rs/tokio/1.37.0/tokio/time/fn.pause.html
https://docs.rs/tokio/1.37.0/tokio/time/fn.resume.html

and notably:
https://docs.rs/tokio/1.37.0/tokio/attr.test.html#configure-the-runtime-to-start-with-time-paused

Answer 2 · 2024-05-08T21:02:47.000Z

As for rstest I don't know if you can sequence tests, if so, you can just reset time between setup and teardowns

Answer 3 · 2024-05-08T21:26:20.000Z

I've elated to just putting the reset in every test that uses the mock. I was really hoping i could just make it a fixture, but it def works better when i put it in the test body.
I missed tokio::test's time features, I'll have another look at that!

Answer 4 · 2024-05-08T21:30:13.000Z

I don't actually see SystemTime anywhere in Tokio, looks like they only wrap Instant and Duration. We need to use SystemTime due to some requirements.

Answer 5 · 2024-05-08T22:30:54.000Z

I see. you are correct. Originally this crate didn't have a mockable SystemTime but a user requested that it was added recently.

Without knowing more context, the only thing I can suggest is to either make tests that use SystemTime more fine-grained (where you can delineate time advances and resets more precisely) or to ensure you reset the clock between tests (tasks/futures). Because this uses a global variable, multiple threads or tasks will logically conflict. The rust test runner (again, I don't know what rstest does, I'm familiar with nextest which has its own test runner. But I assume rstest just generates a bunch of boilerplate for the Rust's libtest runner. In the near future libtest2 (being highly influenced by the design of nextest) will allow more user-pluggable logic and perhaps there'll be places where you can inject setup/teardown cycles.

I don't have a good solution for this. the rust std (which I try to emulate) implementations use globals from an external source. My source is just a Duration (basically a u64) inside of a Mutex that can be added/subtracted/set to a specific value.

One idea would be to move away from global state, e.g. threading in a impl TimeSource to things that use time, which'll let you have a per-task/thread/unit clock. But this would require creating a new library that allows for the classic style of dependency injection with a checked source (atomic fences w/ conditionals (when to reset, what the 'start epoch would be' etc, around SystemTime/Instant clocks).

A while back, cargo test (the frontend for libtest) removed single-threaded tests. I originally provided the optional for a thread-local global (each MockClock was bound to their own thread, as the thread was created) but this became a problem when the test runner always ran tests in a N+1 thread configuration. Originally, it had the option to run all of the tests in the main thread, so when a user spawned a thread they can set up their time source for that new thread.

I don't know what I could change on my end to help with your problem, but I can offer suggestions/help. I understand how tokio does time pretty well and I'm rather familiar with libtest (and libtest2).

One could try cargo test my_serialize_tests* -- --test-threads=1 (where my_serialized_tests* is some filter for those few tests that should be ran sequentially). But I know this is a pain and definitely not ideal.

Answer 6 · 2024-05-08T22:33:31.000Z

If you can give a few simple examples of where SystemTime gets into invalid state, I can perhaps point out where you should reset the clock.

One idea would be to have a simple barrier (or using a semaphore) w/ a drop guard to either block the test thread until the unit-of-work is done, or ensure things have a re-initialized clock (back to some epoch) before they run and that they reset when they are finished.

And another would be a bigger change that'll require passing around a time source/context to things that use time. Like how one would unit test a RNG where you can seed it with a stack-local value

(This is that impl TimeSource idea mentioned above. but its definitely not drop-in like this crate is intended to be used.)

Answer 7 · 2024-05-10T17:33:00.000Z

After further investigation, it seems that I just needed to adjust "where" I reset the clock. I had originally created an rstest #[fixture] fn _mock_clock() -> () to provide an easy way to reset the reset in tests (and avoid clippy's wrath). Even though this worked in some tests, it didn't seem to always work. I conceded to just have a set_mocked_clock() fn that i called at the top of my tests (or at the top of any #[fixture] that needed a timestamp to init).
Also had noticed some of the code base was using the mock and others weren't, so I updated that and ensured everyone was using mock_instant during tests and updated tests to increment time appropriately. So ultimately user error.
Thank you for you're detailed response they are helpful if I do run into further errors and need to ensure I run time tests sequentially. I think (so far) what you have built does work as long as you've ensured all code uses the mock during tests.

Answer 8 · 2024-05-29T22:29:22.000Z

Just FYI. I'm still having a great deal of issues with concurrent tests and I've decided I have to lock any test that uses MockClock since its not thread safe, at least not with #[rstest] and #[tokio::test]. I really wanted it to work with #[rstest] out of the box because #[fixtures] that required time could just reset the clock when used. But I'm going to have to build extra infrastructure around it to lock it for each test.
As you mentioned earlier, this crate would benefit from each test spooling up its own "fake" clock that is used by the code, but this is difficult since the only way to use it globally is to make it static... I think the best addition would be to create a set_time_and_lock() -> Mutex such that when you set the time you also capture the lock, in this way when the tests starts and you reset the clock it prevents other tests from resetting the clock mid test, thus once the test is done the next text in line wanting the lock grabs it.
I'll have to keep digging which is the culprit (tokio or rstest) and let you know.

Answer 9 · 2024-05-31T23:28:35.000Z

I could perhaps support an api like that.

But acquiring a global mutex in multiple tests could lead to interesting side effects. (ordering)

I would suggest using a DI-style to things that use a clock so you can pass in frame-local clocks to the things that use it. I know its not a drop-in but it'd give you the flexibility of when things are constructed and used.

a theory of how the design could look:

trait Clock {
  fn now(&self) -> Instant;
}

struct RealClock {} // TODO

impl Clock for RealClock {
  fn now(&self) -> Instant { Instant::now() }
}

struct MockClock {} // TODO

impl Clock for MockClock {
   fn now(&self) -> Instant {
     // swap out Instant with a mocked one, like this crate does
   }
}

struct MyThing;

impl MyThing {
  fn something(&self, clock: &impl Clock) {
    let now = clock.now();
  }
}

#[test]
fn foo() {
  let clock = MockClock::new();
  let thing = MyThing;
  thing.something(&clock);
}

fn real() {
  let clock = RealClock::new();
  let thing = MyThing;
  thing.something(&clock);
}

A new crate could probably be designed to support this. Instead of global state, it'd use monomorphization to provide stack-local timing.

On a side note, this is one of the reasons rust makes global state (and singletons) difficult because it always leads to hard to reason about states in multi-threaded code.

Answer 10 · 2024-05-31T23:39:41.000Z

I'm basing this pattern on crates like rand which abstract rand sources via a trait:
https://docs.rs/rand/latest/rand/trait.RngCore.html

and using this for a deterministic rng:
https://docs.rs/rand/latest/rand/rngs/mock/struct.StepRng.html

(rand uses a trait for trait implementation. things impl RngCore and there is a blanket impl for the main Rng trait:
https://docs.rs/rand/latest/rand/trait.Rng.html )

But notably, the time implementation generally would be rather simple. In tests it'd be a monotonic counter, like this crate does. So the core trait (or blanket trait) wouldn't really be needed.

Answer 11 · 2024-06-02T13:40:41.000Z

As reported by #15 people have a use for thread-local sources.

You may want to try it to see if it solves your problem:
https://docs.rs/mock_instant/latest/mock_instant/

(it moves the types into a global and thread_local module)

@dr-kernel to describe when one would use aglobal source over a thread_local source: using a global source would persist the clock state across all threads. Cargo runs each test in a new thread and catches the panics to report failures. So using a thread local source would create a new clock per test. But there's a problem with this, cargo actually reuses threads via a thread pool. So even with a thread local design one would get a new source up until it hits the test threads limit and it'll reuse the state from some other run. This is hard to work around, but ensuring the clock gets reset at the start (or end if a drop guard (cleanup handler) can be run) would side step this.

I've basically duplicated all of the types, where one set will use a mutex and the other a thread local cell.

Even though using a thread local source will work, Tokio also has a thread pool when using the multi-thread runtime. So ideally you'd configure the test to use the single threaded runtime so the source isn't created for the moved context when a task moves threads. By default Tokio uses a single thread executor https://docs.rs/tokio/latest/tokio/attr.test.html#current-thread-runtime but one can change the flavor.

To help with your problem, I would suggest using the new thread local System time, resetting the clock before or after each test and ensuring the Tokio executor is single threaded (current_thread)

I'm sorry this has been so much trouble for you, but this swapping out types and using a singleton to mimic the OS source will always be hacky :(

If the above doesn't resolve the issue and If you could provide a simple demo test application that has the sometimes faulty logic I can maybe figure out a more tailored to you solution. (Using retest, Tokio, SystemTime)

Answer 12 · 2024-06-03T14:52:42.000Z

And to expand on some patterns, I could probably provide a guard type that resets the clock after the test runs.

Something like

#[test]
fn foo() { 
    let _guard = MockClock::reset_guard();
    // Stuff
}

I haven't thought about the implications of resetting both the Instant source and SystemTime source. Maybe 3 functions: reset_instant, reset_system_time and reset (both)

(Of course names are provisional, better ones would be chosen once the behavior is decided upon)

The guard would also allow scoped states, naturally

Answer 13 · 2024-06-03T15:37:32.000Z

But there's a problem with this, cargo actually reuses threads via a thread pool. So even with a thread local design one would get a new source up until it hits the test threads limit and it'll reuse the state from some other run. This is hard to work around, but ensuring the clock gets reset at the start (or end if a drop guard (cleanup handler) can be run) would side step this.

I just checked the libtest source in rustc and this is incorrect. They aren't using a thread pool any more. So, under the current behavior a thread local won't be reused. But one should not rely on internal details. The prior advice should still be considered