MongoDB backend supports a maximum of 64 stages per database
Opened this issue · 1 comments
In the MongoDB backend, indexes are used to speed up queries against the document metadata. Due to how the metadata (e.g. touched, fetched) is structured, one index is created per stage.
MongoDB has a limit on the number of indexes per database: 64. This means running several pipelines with many stages will risk hitting the index limit.
Ideally, the index on document metadata should be per field. One solution could be to add a secondary metadata field per state (touched, fetched, etc) and make that a list of stage names. That can then have an index, making the indexes scale with the number of metadata fields instead of their values.
This will require changing the way the metadata is queried by stages, however.
I've tested a modification to the metadata format, where fetched
is stored as a list of stage names alongside the normal map of timestamps. This means an index can be placed on the field in MongoDB and there will be no problems with scaling beyond 64 stages (MongoDB index limit).
Ideally, a compound index in MongoDB should be used to create an index on _action
, touched
, fetched
, processed
, discarded
and failed
(aka the metadata fields).
Unfortunately, in versions <2.6.x of MongoDB, queries can only use one index - and it is not possible to have more than one array ("multikey") indexed in the same compound index. This means we can't use this metadata format (i.e. lists instead of timestamps) in older MongoDB versions.
For 2.6.x, queries can use multiple indexes, and it seems it should be possible to have several multikey indexes.