rwynn/monstache

Bug: Resuming on relate uses wrong index name

Closed this issue · 49 comments

Hi @rwynn ,
Thanks for your work, really amazing.

When Resuming using the following config, events in the change stream gets indexed to a wrong index name
{"index":{"_index":"5c976492cf09a2266a829a76_db.students","_id":"5c9e4143f37e264e131c84be","

version:4.16.0

resume=true
[[relate]]
	namespace="db.students"
	with-namespace="db.studentsView"
	src-field="refId"
	match-field="_id"

[[mapping]]
  namespace="db.studentsView"
  index="studentsView"
  type="studentsView"

This only happens on resumng, works fine when listening to new events in the change stream

Thanks!

rwynn commented

Hi @jodevsa

Thanks. Do you have any other configuration specified (excluding the connection and auth info)?. If you have direct-read-namespaces are they targeting the view instead of the collection?

rwynn commented

I'm not even sure how it's possible the index could start with (an id?) appended with an underscore like this 5c976492cf09a2266a829a76_db.students. I don't believe there is any code that I'm aware of in monstache that would format the index like this. Are you using any plugins?

rwynn commented

Is it possible that there is some erroneous data in the oplog (possibly from a previous bad insert) such that the ns field in the oplog is set to 5c976492cf09a2266a829a76_db.students?

rwynn commented

If so, maybe this would help:

namespace-regex = '^db\.students'

here is the rest of the config, direct-read-namespaces are commented,

# additional settings

# compress requests to Elasticsearch
gzip = true
# generate indexing statistics
stats = true
# index statistics into Elasticsearch
index-stats = false
# use the following PEM file for connections to MongoDB

# disable PEM validation
mongo-validate-pem-file = false
# use 4 go routines concurrently pushing documents to Elasticsearch
#elasticsearch-max-bytes = 400000
elasticsearch-max-conns = 2
# validate connections to Elasticsearch
elastic-validate-pem-file = true
# propogate dropped collections in MongoDB as index deletes in Elasticsearch
dropped-collections = true
# propogate dropped databases in MongoDB as index deletes in Elasticsearch
dropped-databases = true
# do not start processing at the beginning of the MongoDB oplog
# if you set the replay to true you may see version conflict messages
# in the log if you had synced previously. This just means that you are replaying old docs which are already
# in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones.
replay = false
# resume processing from a timestamp saved in a previous run
resume = true
# do not validate that progress timestamps have been saved
# override the name under which resume state is saved
resume-name = "default"
# exclude documents whose namespace matches the following pattern
# turn on indexing of GridFS file content
index-files = false
# turn on search result highlighting of GridFS content
file-highlighting = false
# print detailed information including request traces
verbose = true

# enable clustering mode
#cluster-name = 'apollo'
# do not exit after full-sync, rather continue tailing the oplog
exit-after-direct-reads = false

direct-read-namespaces is commented and yeah i searched the monstache code trying to figure out what happened nothing adds an ID, i've also checked the related collections i don't have any document with that ID

tried to reproduce several times, looks like the prefix ID is always "5c976492cf09a2266a829a76"

rwynn commented

It might have been a mistake that got deleted later but was recorded in the oplog. Anyway it seems to saying that (maybe at one point) you have/had a db named 5c976492cf09a2266a829a76_db in MongoDB. It would have been recorded in local.oplog.rs as a record with an ns field = 5c976492cf09a2266a829a76_db.students.

rwynn commented

It should be ignored if you explicitly set the allowed namespaces

namespace-regex = '^db\.students'

i'm intrested in more than one collection so i change the regex to
namespace-regex= '^db'

Now resumes stopped working, live change streams updates still working fine

rwynn commented

Hmm. Maybe try

namespace-regex= '^db.+'

Same tried both
namespace-regex= '^db.+'
and
namespace-regex = '^db.students'

rwynn commented

Actually tested MatchString method in go and you should only need the ^db.

{ _id:
   { _data:
      '825C9E51CE0000000429295A1004E037522509E94139BEA8D7316E78898D46645F696400645C9E48F9F37E264E131C84F10004' },
  operationType: 'update',
  clusterTime:
   Timestamp { _bsontype: 'Timestamp', low_: 4, high_: 1553879502 },
  ns: { db: 'db', coll: 'student' },
  documentKey: { _id: 5c9e48f9f37e264e131c84f1 },
  updateDescription:
   { updatedFields:
      { description: '<p>123<br/></p>',
        updatedAt: 2019-03-29T17:11:42.892Z },
     removedFields: [] } } 

Tried to listen to the change stream, this is the output

rwynn commented

Monstache gets the timestamp to resume from in monstache.monstache collection by default. There should be a doc from a previous run in there with an _id of your resume-name config.

{
    "_id" : "default",
    "ts" : Timestamp(1553879502, 4)
}

?

Updates and resumes are working fine, the problem is the index is occuring on the real collection not the view and the indexName on elastic is not right

rwynn commented

If you are on MongoDB 4.0+ you can enable using the new change streams API with

change-stream-namespaces = [ "db.students" ]

Otherwise Monstache defaults to the old method of tailing the oplog (before the api became available).

rwynn commented

If you want to listen to the whole database use

change-stream-namespaces = [ "db" ]

Sorry forgot to add that line too, i'm using change-stream-namespaces

rwynn commented

Other thing you can do to work around is add another mapping. I'm not entirely sure why it's need though.

[[relate]]
	namespace="db.students"
	with-namespace="db.studentsView"
	src-field="refId"
	match-field="_id"
        keep-src=false
[[mapping]]
  namespace="db.students"
  index="studentsView"
  type="studentsView"

actual change stream event

{ _id:
   { _data:
      '825C9E51CE0000000429295A1004E037522509E94139BEA8D7316E78898D46645F696400645C9E48F9F37E264E131C84F10004' },
  operationType: 'update',
  clusterTime:
   Timestamp { _bsontype: 'Timestamp', low_: 4, high_: 1553879502 },
  ns: { db: 'db', coll: 'student' },
  documentKey: { _id: 5c9e48f9f37e264e131c84f1 },
  updateDescription:
   { updatedFields:
      { description: '<p>123<br/></p>',
        updatedAt: 2019-03-29T17:11:42.892Z },
     removedFields: [] } } 

monstache log
{"index":{"_index":"5c976492cf09a2266a829a76_db.studnet","_id":"5c9e48f9f37e264e131c84f1","_type":"_doc","version":6673861643014766596,"version_type":"external"}}

@rwynn same issue

rwynn commented

I don't know where that prefix could be getting added. Here is the code that sets the namespace attribute.

https://github.com/rwynn/gtm/blob/development/gtm.go#L168

It should just be joining the 2 fields with a dot.

This later flows into Monstache but I don't see how it could be getting the prefix there.

rwynn commented

I just did a quick test on my end in a VM and couldn't get the prefix added with a resume set. You mentioned this was a shared instance of MongoDB? Could this be related to that? Maybe some sort of tenant namespacing?

really weird, I'm using a mongo shared cluster on atlas

But listening to the stream using a nodeJS script outputs the correct stream

ns: { db: 'db', coll: 'student' },

rwynn commented

Yeah, I'm not sure why it's working differently. But seems to be consistent with that same ID.

rwynn commented

Are you able to do a find on the underlying local.oplog.rs collection? You might not be able to in a shared cluster on altlas.

@rwynn nope

rwynn commented

You could probably work around by adding another relate with the prefix added to the namespace.

rwynn commented

Looks like you can only access the oplog on an M10 or greater in Atlas.

rwynn commented

@jodevsa I created a ticket for this in the MongoDB Atlas support. Will report back if I learn anything.
Please let me know if you can recreate this outside of that environment.

rwynn commented

@jodevsa By the way, since you said this only triggers with resume on: in that case the change stream is started with the startAtOperationTime option set so that it can read previous entries. Maybe this bug only happens when that option is set?

https://docs.mongodb.com/manual/reference/method/db.collection.watch/

@rwynn you are right ! when using the resumAfter option, the namespace changes to "5c976492cf09a2266a829a76_db'"

ns:
   { db: '5c976492cf09a2266a829a76_db', coll: 'student' },

good news, this only occur on shared cluster!

rwynn commented

Would you mind writing up some instructions for repeated this with the mongo client? I have a ticket open now and they suspect it might be related to the shared tier cluster namespace. They asked for a code sample to repeat. Would be great to have a code sample that repeats with the mongo client to give them.

@rwynn yeah sure

// npm install mongodb
const MongoClient = require('mongodb').MongoClient;
const dbName = "db";
const url = `mongodb+srv://username:password@***********-i5mih.mongodb.net/${dbName}?retryWrites=true`;
const collectionName = "event";

const sleep = function(ms) {
    return new Promise((resolve, reject) => {
        setTimeout(() => {
            resolve(ms);
        }, ms)
    })
}
async function main() {
    let client = await MongoClient.connect(url, {
        useNewUrlParser: true
    })
    const db = client.db(dbName);
    const firstStream = await db.collection(collectionName).watch()
    let firstID
    firstStream.on('change', function(change) {
        console.log("first event ns", change.ns);
        firstID = change._id;
    });
    await sleep(1000)
    await db.collection(collectionName).insertOne({
        name: "first event "
    })
    await sleep(500);
    await firstStream.close();
    await db.collection(collectionName).insertOne({
        name: "second event"
    })
    const secondStream = await db.collection('event').watch({
        resumeAfter: firstID
    });
    secondStream.on('change', async function(change) {
        console.log("second event ns", change.ns);
        await secondStream.close();
        await client.close();
    });
}

main();

sample output:


first event ns { db: 'db', coll: 'event' }
second event ns { db: '5c976492cf09a2266a829a76_db', coll: 'event' }

The following code reproduces the issue, it prints the ns of 2 events from 2 different change streams
the first stream doesn't have any options specified, the second one uses the 'resumeAt' taken from the first event

rwynn commented

Thanks that’s perfect. I’ll attach it to the ticket.

Great find on this issue!

Awesome

rwynn commented

@jodevsa got an update today that a fix for this on the Atlas side is planned to land in the next couple weeks. They've been able to replicate with your script and it's on their radar.

@rwynn
Thanks for the update, thats awesome, I really appreciate your time on this; knowing that it's not a bug/issue with the tool.

@rwynn Atlas have fixed the issue in the latest maintenance update 4.010

rwynn commented

That’s good news. Thanks for the update.

@jodevsa @rwynn has this issue resurfaced? I am testing a setup now with replay and resume on and in ES an index is created with an id. When doing an update in mongo after all previous documents were synced, then another index with the correct name is created for the newly updated docs.

What could be absurd is that I don't think I had this issue a few weeks ago when I ran a first test. I don't remember having seen an index with an prefixed id at that time.

I am running monstach in docker (image version 6.3.1) and I am testing with Mongo Atlas M0 (noticed they are at version 4.0.13 now) and ES Cloud 7.0.1.

All options are configured via commandline params given to the container.

rwynn commented

@sbeaupre you may want to try to replicate again using just the node code that @jodevsa previously posted to this issue. If you find that the issue has come back it would be helpful if you created a ticket with MongoDB.

For reference, the previous ticket that I created with them on March 29th was given case #00558022. You might state that is was fixed in 4.0.10 and seems to have come back now in 4.0.13.

I can confirm that it is an issue (again). I'll contact mongo support with your reference number