performance of mongodb queries to retrieve logs in admin
Closed this issue · 36 comments
It happens again. Probably also related to write queries happening at the same time?
mongos> db.cm_log.count()
4969165
mongos> db.cm_log.count({level:400})
20805
mongos> db.cm_log.count({level:300})
4193617
mongos> db.cm_log.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "example.cm_log"
},
{
"v" : 1,
"key" : {
"level" : NumberLong(1),
"createdAt" : NumberLong(1)
},
"name" : "level_1_createdAt_1",
"ns" : "example.cm_log",
"background" : true
},
{
"v" : 1,
"key" : {
"context.extra.type" : NumberLong(1),
"level" : NumberLong(1),
"createdAt" : NumberLong(1)
},
"name" : "context_extra_type_1_level_1_createdAt_1",
"ns" : "example.cm_log",
"background" : true
},
{
"v" : 1,
"key" : {
"message" : NumberLong(1)
},
"name" : "message_1",
"ns" : "example.cm_log",
"background" : true
}
]
It never happens for "/log/stats" but only for "/log?level=X".
@fvovan can you investigate?
Example long running query:
{
"opid" : 166492112,
"active" : true,
"secs_running" : 1182,
"microsecs_running" : NumberLong(1182201490),
"op" : "query",
"ns" : "example.cm_log",
"query" : {
"count" : "cm_log",
"query" : {
"message" : {
"$exists" : true
},
"level" : {
"$in" : [
NumberLong(300)
]
}
}
},
"planSummary" : "IXSCAN { level: 1, createdAt: 1 }",
"client" : "10.55.40.160:59820",
"desc" : "conn217970",
"threadId" : "0x7f3eb6b16700",
"connectionId" : 217970,
"locks" : {
"^example" : "R"
},
"waitingForLock" : true,
"numYields" : 44658,
"lockStats" : {
"timeLockedMicros" : {
"r" : NumberLong(1319777562),
"w" : NumberLong(0)
},
"timeAcquiringMicros" : {
"r" : NumberLong(370976651),
"w" : NumberLong(0)
}
}
},
I saw that too. It's only the config node. I guess it's an anomaly and we can ignore it.
MongoDB 3 upgrade is independent of Debian 8. You can track it here: cargomedia/puppet-packages#1255
So the problem is {"message" : {$exists: true}}
I simulated mongo load appr. 300-400 inserts per sec.
and starting from some point of collection size (2M), query
db.cm_log.count({"message" : {$exists: true}, "level" : 300})
takes more than 30 secs and locks the collection, while just {"level": 300}
query
executes almost instantly.
Unfortunately there is no explain
option for count()
in mongo version we have now. But i tried to recon index usage, and the result is mongo doesn't use indexes for $exists
operator.
Let's just modify the query, anyway as checked, there are no documents without message
sgtm, can you open a separate PR, so we can close this only once we verified the impact on prod?
Cleanup could also be optimized. Currently every type is cleaned up individually in Maintenance_Cli instead of simply deleting all entries that are too old in one go.
Cleanup could also be optimized. Currently every type is cleaned up individually in Maintenance_Cli instead of simply deleting all entries that are too old in one go.
Yes probably it could, but the thing is cleanUp()
can be overridden in certain types to preserve important log records, that's why all "typed" documents are processed separately.
Yes they also have different values for maximum age I see.
Wouldn't it be better to use mongo's ttl-mechanism for this? It's null on production, but records are still cleared through the cleanup-mechanism. Different ttls could be achieved by configuring different loggers for specific log-types. I expect ttl-based removal to be much more efficient than this manual cleanup.
maybe,
it could be done without using several loggers but rather with injecting expireAt
(or something like that) property to each document in mongo handler, depending on record type
It didn't help :/
Had a timeout on /log?level=400
just now:
MongoCursorTimeoutException: www3.example.com:27017: Read timed out after reading 0 bytes, waited for 30.000000 seconds in /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/PagingSource/MongoDb.php on line 89
0. {main} at /home/example/serve/public/index.php line 0
1. CM_Bootloader->execute(Closure) at /home/example/releases/20160902122350/public/index.php line 11
2. {closure}() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Bootloader.php line 163
3. CM_Http_Handler->processRequest(CM_Http_Request_Post) at /home/example/releases/20160902122350/public/index.php line 9
4. CM_Http_Response_Abstract->process() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Handler.php line 28
5. CM_Http_Response_View_Abstract->_process() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/Abstract.php line 48
6. CM_Http_Response_Abstract->_runWithCatching(Closure, Closure) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/View/Abstract.php line 170
7. CM_Http_Response_View_Abstract->{closure}() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/Abstract.php line 273
8. CM_Http_Response_View_Ajax->_processView([]) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/View/Abstract.php line 167
9. CM_View_Abstract->ajax_loadPage(['path' => '/log?level=300'], CM_Frontend_JavascriptContainer_View, CM_Http_Response_View_Ajax) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/View/Ajax.php line 33
10. CM_Http_Response_View_Abstract->loadPage(['path' => '/log?level=300'], CM_Http_Response_View_Ajax) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/View/Abstract.php line 36
11. CM_Http_Response_Abstract->process() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/View/Abstract.php line 87
12. CM_Http_Response_Page_Embed->_process() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/Abstract.php line 48
13. CM_Http_Response_Page->_processContentOrRedirect() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/Page/Embed.php line 30
14. CM_Http_Response_Page->_processPageLoop(CM_Http_Request_Get) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/Page.php line 82
15. CM_Http_Response_Page->_processPage(CM_Http_Request_Get, CM_Http_Response_Page_ProcessingResult) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/Page.php line 95
16. CM_Http_Response_Abstract->_runWithCatching(Closure, Closure) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/Page.php line 152
17. CM_Http_Response_Page->{closure}() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/Abstract.php line 273
18. CM_Http_Response_Page_Embed->_renderPage(Admin_Page_Log) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/Page.php line 143
19. CM_RenderAdapter_Layout->fetchPage() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Http/Response/Page/Embed.php line 26
20. CM_RenderAdapter_Component->fetch() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/RenderAdapter/Layout.php line 87
21. CM_Frontend_Render->fetchViewResponse(CM_Frontend_ViewResponse) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/RenderAdapter/Component.php line 23
22. CM_Frontend_Render->fetchViewTemplate(Admin_Page_Log, 'default', []) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Frontend/Render.php line 487
23. CM_Frontend_Render->fetchTemplate('library/Admin/layout/default/P...', [], []) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Frontend/Render.php line 479
24. Smarty_Internal_TemplateBase->fetch() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Frontend/Render.php line 85
25. content_57c97007558fd6_26342174(Smarty_Internal_Template) at /home/example/releases/20160902122350/vendor/smarty/smarty/libs/sysplugins/smarty_internal_templatebase.php line 182
26. smarty_function_component([], Smarty_Internal_Template) at /home/example/releases/20160902122350/tmp/smarty/Admin_Page_Log_101_en^93d5e3bd10cf1a2ec1d667877804c236f1375238.file.default.tpl.php line 44
27. CM_RenderAdapter_Component->fetch() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/SmartyPlugins/function.component.php line 21
28. CM_Frontend_Render->fetchViewResponse(CM_Frontend_ViewResponse) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/RenderAdapter/Component.php line 23
29. CM_Frontend_Render->fetchViewTemplate(Admin_Component_LogList, 'default', []) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Frontend/Render.php line 487
30. CM_Frontend_Render->fetchTemplate('vendor/cargomedia/cm/layout/de...', [], []) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Frontend/Render.php line 479
31. Smarty_Internal_TemplateBase->fetch() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Frontend/Render.php line 85
32. content_57c97007a09953_61088818(Smarty_Internal_Template) at /home/example/releases/20160902122350/vendor/smarty/smarty/libs/sysplugins/smarty_internal_templatebase.php line 182
33. CM_Paging_Abstract->rewind() at /home/example/releases/20160902122350/tmp/smarty/Admin_Component_LogList_101_en^649fdec06fbf8ca1d48945985f87730b50a65993.file.default.tpl.php line 82
34. CM_Paging_Abstract->getItems() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Paging/Abstract.php line 481
35. CM_Paging_Abstract->_getItemsRaw() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Paging/Abstract.php line 31
36. CM_PagingSource_MongoDb->getItems(0, 51) at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/Paging/Abstract.php line 347
37. MongoCursor->rewind() at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/PagingSource/MongoDb.php line 89
38. {throw} at /home/example/releases/20160902122350/vendor/cargomedia/cm/library/CM/PagingSource/MongoDb.php line 89
What is the difference of:
/log/stats
: usingSK_App_Stats::getLog(start, stop, 400)
- fast/log?level=400
: usingnew CM_Paging_Log([400])
- slow
The most obvious difference is that SK_App_Stats uses an exact match for level
whereas CM_Paging_Log uses a range-match, even when a single level is requested. Mongodb is notoriously bad at range-matches, so it might help to do an exact match if only one level is requested.
oh. bad me.
It's the same {$exists: true}
but now used for context.extra.type
.
What about configuring dummy type as 0
?
It sounds stupid but seems like even in schemaless storages we have to keep schema..
even when a single level is requested. Mongodb is notoriously bad at range-matches, so it might help to do an exact match if only one level is requested.
I did few benchmarks, there is no difference. Probably query optimizer transforms it. But it's not difficult to set up on our side.
If it doesn't make a difference then I wouldn't do it.
Regarding type: Another approach, if we don't want to enforce a dummy-type, would be to make the type-index sparse and hint it in the query-options. Maybe that helps.
Regarding type: Another approach, if we don't want to enforce a dummy-type, would be to make the type-index sparse and hint it in the query-options. Maybe that helps.
it seems it's not possible to use hint
for count()
queries
As far as I can see, the paging's count is only used for the pagination so it can display a link to the last available page. How about we try to remove that?
As discussed: Vladimir will re-create a big MongoDB log collection, once it's ready let's investigate this together @alexispeter @fvovan
As discussed: let's try always setting context.extra.type
, to avoid a sparse index.
I can't say that setting mandatory type helped a lot.
Despite this trick gave some speed-up at dev environment, sometimes it still throws on production one.
last attempt was not good enough as well.
Let's try next "trick".
The only thing needs to be considered is cleaning redundant indexes in the end.
Using single index for filtering and sorting finally gave positive results.
So next step is adding index for "type only" query and cleaning obsolete indexes
last remaining "all" listing is fixed.
so everything should work
👍
Are the "experimental" indexes removed already?
Can you summarize what was the issue and how you solved it eventually?
Are the "experimental" indexes removed already?
They were removed in #2384
Can you summarize what was the issue and how you solved it eventually?
Briefly there were two problems. One is bad performance on $exists
queries that was fixed with introducing dummy type for cm_log
collection.
Another problem is relying on mongo's index Intersection which worked well on dev environment but not on production one. As it turned out it's necessary to fulfil the whole query (I mean its filtering and sorting parts ) with single index.
Now we have these indices:
- 1: _id
- 2: level, createdAt
- 3: context.extra.type, level, createdAt
- 4: message
- 5: context.extra.type, createdAt
Do we need all of them? 2, 3 and 5 cannot be combined?
Do we need all of them? 2, 3 and 5 cannot be combined?
That's the whole point..
sometimes we query by level, sometimes by type, sometimes by both of them. They probably could be combined if some of these queries didn't contain sort
modifier. Doc says
Index intersection does not apply when the sort() operation requires an index completely separate from the query predicate.
So even if we believe in index intersection 2 and 5 are necessary. The only one that theoretically could be deleted is 3. I doubt it's possible especially for count
queries.
Can be checked on live server only because as it turned out the behaviour is different due either server load, or storage topology..
Ok I get it now.
Closeable?
agree