leak_check.flush.01 and leak_check.clear.01 tests fail
euspectre opened this issue · 4 comments
After I have merged "several_targets" branch, the 2 test mentioned above report failures.
The test script is generated from sources/tests/leak_check/test_several_targets.sh.in.
From the test log:
57/136 Testing: leak_check.flush.01 57/136 Test: leak_check.flush.01 Command: "/bin/bash" "test_flush.sh" Directory: /var/opt/kedr/tests/4.1.16-nrj-desktop-1rosa-x86_64/leak_check "leak_check.flush.01" start time: Mar 08 21:58 MSK Output: ---------------------------------------------------------- Starting KEDR... /sbin/insmod /lib/modules/4.1.16-nrj-desktop-1rosa-x86_64/extra/kedr.ko target_name=kedr_sample_target mkdir -p "/var/tmp/kedr/test/4.1.16-nrj-desktop-1rosa-x86_64/leak_check/debugfs" mount debugfs -t debugfs "/var/tmp/kedr/test/4.1.16-nrj-desktop-1rosa-x86_64/leak_check/debugfs" /sbin/insmod /lib/modules/4.1.16-nrj-desktop-1rosa-x86_64/extra/kedr_leak_check.ko /sbin/insmod /lib/modules/4.1.16-nrj-desktop-1rosa-x86_64/extra/kedr_lc_common_mm.ko KEDR started. Loaded the target. Total number of allocations is 0 but it should be 1 The data in report_flush/01_target_loaded.log differs from what has been expected (expected: allocs=1, leaks=1, bad_frees=0) Stopping KEDR... /sbin/rmmod /lib/modules/4.1.16-nrj-desktop-1rosa-x86_64/extra/kedr_lc_common_mm.ko /sbin/rmmod /lib/modules/4.1.16-nrj-desktop-1rosa-x86_64/extra/kedr_leak_check.ko umount "/var/tmp/kedr/test/4.1.16-nrj-desktop-1rosa-x86_64/leak_check/debugfs" /sbin/rmmod /lib/modules/4.1.16-nrj-desktop-1rosa-x86_64/extra/kedr.ko KEDR stopped. --------------------------- 58/136 Testing: leak_check.clear.01 58/136 Test: leak_check.clear.01 Command: "/bin/bash" "test_clear.sh" Directory: /var/opt/kedr/tests/4.1.16-nrj-desktop-1rosa-x86_64/leak_check "leak_check.clear.01" start time: Mar 08 21:58 MSK Output: ---------------------------------------------------------- Starting KEDR... /sbin/insmod /lib/modules/4.1.16-nrj-desktop-1rosa-x86_64/extra/kedr.ko target_name=kedr_sample_target mkdir -p "/var/tmp/kedr/test/4.1.16-nrj-desktop-1rosa-x86_64/leak_check/debugfs" mount debugfs -t debugfs "/var/tmp/kedr/test/4.1.16-nrj-desktop-1rosa-x86_64/leak_check/debugfs" /sbin/insmod /lib/modules/4.1.16-nrj-desktop-1rosa-x86_64/extra/kedr_leak_check.ko /sbin/insmod /lib/modules/4.1.16-nrj-desktop-1rosa-x86_64/extra/kedr_lc_common_mm.ko KEDR started. Loaded the target. Writing to /dev/cfake0. Total number of allocations is 0 but it should be 1 The data in report_clear/02_write1.log differs from what has been expected (expected: allocs=1, leaks=1, bad_frees=0) Stopping KEDR... /sbin/rmmod /lib/modules/4.1.16-nrj-desktop-1rosa-x86_64/extra/kedr_lc_common_mm.ko /sbin/rmmod /lib/modules/4.1.16-nrj-desktop-1rosa-x86_64/extra/kedr_leak_check.ko umount "/var/tmp/kedr/test/4.1.16-nrj-desktop-1rosa-x86_64/leak_check/debugfs" /sbin/rmmod /lib/modules/4.1.16-nrj-desktop-1rosa-x86_64/extra/kedr.ko KEDR stopped.
The files report_flush/01_target_loaded.log and report_clear/02_write1.log are empty.
OS: ROSA R7 x86_64, kernel 4.1.16.
There is a race between echo 1 > flush
and cat info
(simiar situation to clear test case).
Looks like the work queued in klc_do_flush is not finished yet when we trying to read 'info' file.
We could get rid of separate worker and move the code from work_func_flush to klc_do_flush, but I'm not sure if this is a correct solution (and I suppose this worker was introduced intentionally).
Do you have any ideas how to fix it?
Nice catch! Yes, that could be the reason. Need to take a closer look though.
Yes, the worker thread itself is there for a reason: it keeps the events ordered. As for flush, it might be done directly now, if I am not mistaken. We already do flush_workqueue(lc_object->wq);
before it to make sure all the previous events have been processed by then.
Still, it is needed to review the code once more: I cannot say right now if the direct flushing might break something else.
I see it now. The problem is in the accesses to the tables of allocations and deallocations.
klc_flush_*() functions do not use locks for such accesses, that is why they are expected to be called from a work item or on unload. IOW, - when nothing else can access these tables simultaneously. We need a single-threaded ordered wq in LeakCheck anyway, so it was used here as well.
On the other hand, flush_workqueue(lc->wq);
is not that necessary before the calls to klc_do_flush(lc);
. Placing it after such calls instead might be a better idea: after writing to "flush" file returns, it will be guaranteed that flushing is complete.
Will do a bit more testing and commit the fix.
Again, thanks for pointing this out!
Fixed in KEDR 0.6.