hupili/sns-router

DB, Archive, Cache

hupili opened this issue · 0 comments

Current DB to support SRFE is just a SQLite file. This is good initial deployment and SQLite is supported on many devices.

After a month trial, it becomes gradually slower. The following stats show how the size grows at my side:

-rw-r-----+ 1 plhu plhu 120M Nov 17 11:54 srfe_queue.db.20121117-115421
-rw-r-----+ 1 plhu plhu 180M Nov 22 16:01 srfe_queue.db.20121122-160149
-rw-r-----+ 1 plhu plhu 203M Nov 24 22:18 srfe_queue.db.20121124-221805
-rw-r-----+ 1 plhu plhu 221M Nov 26 19:33 srfe_queue.db.20121126-193341
-rw-r-----+ 1 plhu plhu 242M Nov 28 14:11 srfe_queue.db.20121128-141126
-rw-r-----+ 1 plhu plhu 263M Nov 30 13:31 srfe_queue.db.20121130-133146
-rw-r-----+ 1 plhu plhu 266M Nov 30 17:38 srfe_queue.db.20121130-173835
-rw-r-----+ 1 plhu plhu 272M Dec  1 12:30 srfe_queue.db.20121201-123026
-rw-r-----+ 1 plhu plhu 273M Dec  1 15:37 srfe_queue.db.20121201-153738
-rw-r-----+ 1 plhu plhu 317M Dec  5 19:36 srfe_queue.db.20121205-193651
-rw-r-----+ 1 plhu plhu 322M Dec  7 11:28 srfe_queue.db.20121207-112820
-rw-r-----+ 1 plhu plhu 368M Dec 11 08:42 srfe_queue.db.20121211-084244
-rw-r-----+ 1 plhu plhu 384M Dec 12 13:58 srfe_queue.db.20121212-135810

We may want to improve the DB solution:

  • Use NoSQL DB to accelerate transactions.
  • Add a Cache before DB. When we want to do online training for RPR-SGD, cache is essential.
  • Archive old messages periodically, e.g. those who are not tagged and are older than 1 week. We can even design layered archive mechanism, e.g. daily, weekly, monthly.
  • Compress Python objects before storing to DB. Pickle output is further compressable:
$ll message.pickle*
-rw-rw----+ 1 plhu plhu 165M Dec  1 13:03 message.pickle
-rw-rw----+ 1 plhu plhu  40M Dec 12 14:02 message.pickle.tar.gz