data loss: need to flush logFile
Closed this issue · 1 comments
Hi,
I have unit test that insert 40,000+ zip codes. I hacked psys.exe() to raise a ValueError on the 30,000'th transaction. Log file with existing code:
$ ls -l dat/0000000001.log$
-rw-rw-rw- 1 markbucciarelli staff 5337088 Sep 28 08:24 dat/0000000001.log
Then, ran again with flush():
$ ls -l dat/0000000001.log
-rw-rw-rw- 1 markbucciarelli staff 5340077 Sep 28 08:29 dat/0000000001.log
With ASCII pickle protocol, you can also diff the two log file versions and see that without the flush(), the log is incomplete.
diff -r c5bc24a68c12 topics/tornado/pv/core.py
--- a/topics/tornado/pv/core.py Wed Sep 28 08:16:01 2011 -0400
+++ b/topics/tornado/pv/core.py Wed Sep 28 08:34:34 2011 -0400
@@ -86,6 +86,7 @@
def put(self, value):
self.serialId += 1
pickle.dump(value, self.logFile, PICKLE_PROTOCOL)
+ self.logFile.flush()
def putSnapshot(self, root):
# TODO refine error handling
I observe a 5% decrease in performance with flush() for this simple test: from 7,000 inserts/sec to 6,700 inserts/sec :).
This is actually trade-off between performance and durability.
I'll apply this but be aware that this does not guarantee that flushed file will be physically on disk. AFAIK this only means that application's write cache is flushed.
http://stackoverflow.com/questions/3167494/how-often-does-python-flush-to-a-file