DB, Archive, Cache
hupili opened this issue · 0 comments
hupili commented
Current DB to support SRFE is just a SQLite file. This is good initial deployment and SQLite is supported on many devices.
After a month trial, it becomes gradually slower. The following stats show how the size grows at my side:
-rw-r-----+ 1 plhu plhu 120M Nov 17 11:54 srfe_queue.db.20121117-115421
-rw-r-----+ 1 plhu plhu 180M Nov 22 16:01 srfe_queue.db.20121122-160149
-rw-r-----+ 1 plhu plhu 203M Nov 24 22:18 srfe_queue.db.20121124-221805
-rw-r-----+ 1 plhu plhu 221M Nov 26 19:33 srfe_queue.db.20121126-193341
-rw-r-----+ 1 plhu plhu 242M Nov 28 14:11 srfe_queue.db.20121128-141126
-rw-r-----+ 1 plhu plhu 263M Nov 30 13:31 srfe_queue.db.20121130-133146
-rw-r-----+ 1 plhu plhu 266M Nov 30 17:38 srfe_queue.db.20121130-173835
-rw-r-----+ 1 plhu plhu 272M Dec 1 12:30 srfe_queue.db.20121201-123026
-rw-r-----+ 1 plhu plhu 273M Dec 1 15:37 srfe_queue.db.20121201-153738
-rw-r-----+ 1 plhu plhu 317M Dec 5 19:36 srfe_queue.db.20121205-193651
-rw-r-----+ 1 plhu plhu 322M Dec 7 11:28 srfe_queue.db.20121207-112820
-rw-r-----+ 1 plhu plhu 368M Dec 11 08:42 srfe_queue.db.20121211-084244
-rw-r-----+ 1 plhu plhu 384M Dec 12 13:58 srfe_queue.db.20121212-135810
We may want to improve the DB solution:
- Use NoSQL DB to accelerate transactions.
- Add a Cache before DB. When we want to do online training for RPR-SGD, cache is essential.
- Archive old messages periodically, e.g. those who are not tagged and are older than 1 week. We can even design layered archive mechanism, e.g. daily, weekly, monthly.
- Compress Python objects before storing to DB. Pickle output is further compressable:
$ll message.pickle*
-rw-rw----+ 1 plhu plhu 165M Dec 1 13:03 message.pickle
-rw-rw----+ 1 plhu plhu 40M Dec 12 14:02 message.pickle.tar.gz