rwynn/monstache

monstache - problem reading from MongoDB secondaries.

Opened this issue · 0 comments

We have observed a problem when our Monstache instance uses MongoDB read preference "secondary". When we try to insert a large number of documents into MongoDB in a short time, such as 5,000 documents, not all of them are replicated to Elasticsearch. Approximately 4,950 to 4,970 are replicated. However, when we switch Monstache back to read preference "primary", everything is replicated correctly.
All documents are also correctly replicated if we use the connection string to MongoDB with only the secondary MongoDB node name and the parameter directConnection=true. However, in this case, Monstache cannot insert metadata to the MongoDB database.

Monstache uses a MongoDB view as the replication source in MongoDB. Here is our configuration:

elasticsearch-urls = ["${elastic_url}"]
relate-threads = 6000
relate-buffer = 15000
elasticsearch-max-seconds = 10
elasticsearch-max-bytes = 16777216
resume = true
resume-name = "bc2-${resume_id}-contacts"
change-stream-namespaces = [ "${mongodb_database}.contacts" ]
gzip = true
stats = true
elasticsearch-retry = true
prune-invalid-json = true
dropped-databases = false
dropped-collections = false
elasticsearch-client-timeout = 30
enable-http-server = true

[[mapping]]
namespace = "${mongodb_database}.contacts"
index = "contacts${es_suffix}"

[[mapping]]
namespace = "${mongodb_database}.contacts-view"
index = "contacts${es_suffix}"

[[relate]]
namespace = "${mongodb_database}.contacts"
with-namespace = "${mongodb_database}.contacts-view"
keep-src = false

Could the problem be that Monstache sees the document ID in the oplog, takes that document ID, and sends a query to one of the secondaries, e.g., db.contact-view.find({"id":"xyz"}). However, the document is not yet replicated despite being in the oplog, so it gets zero documents as a result of the query ?

we use MongoDB v6.0 , Monstache v6.7.10

db.adminCommand({ getDefaultRWConcern: 1 })
{
  defaultReadConcern: { level: 'local' },
  defaultWriteConcern: { w: 'majority', wtimeout: 0 },
  updateOpTime: Timestamp({ t: 1717599777, i: 6 }),
  updateWallClockTime: ISODate("2024-06-05T15:02:57.764Z"),
  defaultWriteConcernSource: 'global',
  defaultReadConcernSource: 'implicit',
  localUpdateWallClockTime: ISODate("2024-06-05T15:02:57.765Z"),
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1717673598, i: 4 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1717673598, i: 4 })
}