Intermittent test failure
nateberkopec opened this issue ยท 7 comments
This is a race condition caused by timestamps in the json body of the stubbed request
The specific code is
compressed = compress(json)
stub_request(:post, url)
.with(
body: compressed,
headers: {
'Mini-Profiler-Transport-Auth' => 'somepasswordhere',
'Content-Encoding' => 'gzip'
}
)
.to_return(status: 200, body: "", headers: {})
The json being compressed is
"{\"snapshots\":[{\"id\":\"sn941cm9oe408b11kbiw\",\"name\":null,\"started\":1473611003,\"started_at\":1681659363573,\"machine_name\":null,\"level\":0,\"user\":\"unknown user\",\"has_user_viewed\":false,\"client_timings\":null,\"duration_milliseconds\":0,\"has_trivial_timings\":true,\"has_all_trivial_timings\":false,\"trivial_duration_threshold_milliseconds\":2,\"head\":null,\"sql_count\":0,\"duration_milliseconds_in_sql\":0,\"has_sql_timings\":true,\"has_duplicate_sql_timings\":false,\"executed_readers\":0,\"executed_scalars\":0,\"executed_non_queries\":0,\"custom_timing_names\":[],\"custom_timing_stats\":{},\"custom_fields\":{},\"has_flamegraph\":false,\"request_method\":null,\"request_path\":null,\"root\":{\"id\":\"np0okbnejcsug41u99ta\",\"name\":\" http://:\",\"duration_milliseconds\":0,\"duration_without_children_milliseconds\":0,\"start_milliseconds\":0,\"parent_timing_id\":null,\"children\":[],\"has_children\":false,\"key_values\":null,\"has_sql_timings\":false,\"has_duplicate_sql_timings\":false,\"trivial_duration_threshold_milliseconds\":2,\"sql_timings\":[],\"sql_timings_duration_milliseconds\":0,\"is_trivial\":false,\"is_root\":true,\"depth\":0,\"executed_readers\":0,\"executed_scalars\":0,\"executed_non_queries\":0,\"custom_timing_stats\":{},\"custom_timings\":{}},\"started_formatted\":\"/Date(1681659363573)/\"}]}"
I can reliably reproduce the error by inserting sleep 1
between compressed = compress(json)
and stub_request(:post, url)...
I think there's a straightforward fix using ActiveSupport::Testing::TimeHelpers
I'll look into this and open a PR later today. Wanted to post my finding here in the meantime.
Quick update: It's still a race condition and adding the sleep
reproduces it consistently. But... the timestamps in the json body of the stubbed request aren't causing the problem. The json body is generated once and is passed to both the stub and to gzip_transporter
. So those timestamps will always be the same.
I'm currently exploring Zlib::GzipWriter.new(io)
. That appears to have some sort of time dependency. If I sleep before calling #write(json)
, It generates a different binary than compressed = compress(json)
in the test setup.
BAD
body = if @gzip_requests
require 'zlib'
io = StringIO.new
gzip_writer = Zlib::GzipWriter.new(io)
sleep 1 # Sleeping here results in a different zip binary than in the test setup
gzip_writer.write(json)
gzip_writer.close
headers['Content-Encoding'] = 'gzip'
io.string
OK
body = if @gzip_requests
require 'zlib'
io = StringIO.new
gzip_writer = Zlib::GzipWriter.new(io)
gzip_writer.write(json)
sleep 1 # OK to sleep here. Zip binaries are the same
gzip_writer.close
headers['Content-Encoding'] = 'gzip'
io.string
Update 2: Yep. The "local file header" in the Zip file format has a time and date in it. https://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers
Those are changing between the test setup and the call to gzip_writer.write(json)
. For some reason TimeHelpers#freeze_time
doesn't affect the time used by Zlib::GzipWriter
I'm looking for a different fix.
Update 3: Looking at Zlib source, the time is set here https://github.com/ruby/zlib/blob/a68a1f7a9acaea2ce33bc933e625f71ca0d8fcd6/ext/zlib/zlib.c#L2660-L2662 It's a direct call to the C standard library time_t time( time_t *second )
function. I don't see an easy way to fake that.
I think the best strategy will be to mock Zlib::GzipWriter
. We don't care about the contents of the zipped body, as long as it's being zipped. I'm going to try a PR that does this and see how it looks.
I built a fake Zlib::GzipWriter
and wired it in.
class FakeGzipWriter
def initialize(io)
@io = io
end
def write(body)
@io.write("x" * body.length)
end
def close
end
end
It fixed the spec. However I had second thoughts because SnapshotsTransporter#flush_buffer
would be using FakeGzipWriter which is not a verifying mock and is not guaranteed to behave the same as Zlib::GzipWriter
and could mask a future bug.
I then discovered that Timecop https://github.com/travisjeffery/timecop mocks time at a lower system level and works on the C standard library function that Zlib::GzipWriter
uses. Wrapping the spec in Timecop.freeze do
also fixes the intermittent failures ๐ฅณ
This feels like a better solution. I'm going to sleep on it and open a PR in the AM.
Hey @willkoehler ๐ Thanks for looking into this!
I feel like this problem is an indication of a code smell in the test itself:
rack-mini-profiler/spec/lib/snapshots_transporter_spec.rb
Lines 138 to 157 in 0809d9c
It's kind of testing itself, especially the last assertion (plz don't git blame
the test ๐๐
)...
I think there's a better way to write this test which would also make this problem go away.
So right now we're passing the binary of the gzipped JSON that we expect in the request body to the stub_request(...).with(body: ...)
call. This is what's causing our test to be flakey because the binary that we compute during the test can sometimes be slightly different from the binary that the transporter sends in the HTTP request (as you already figured out).
We can rewrite the stub_request().with
call and pass to it a block where we can do custom matching. So instead of passing the body/binary that we expect, we should be able to do something like this:
stub_request(:post, url)
.with { |req| req.headers['Content-Encoding'] == 'gzip' && decompress(req.body) == json }
This should eliminate the test flakiness because we no longer compute the gzipped binary during the test. I also think this will make our test cleaner and more efficient. Let me know what you think!
Thanks for the clear analysis and suggestion @OsamaSayegh I like the way you're thinking ๐
I took your idea one step further by making the stub_request
permissive and running the assertion after the fact with assert_requested
. This results in a test that is more clearly grouped into Arrange / Act / Assert.
I opened a PR #577