Bug: Resuming on relate uses wrong index name

Question

Bug: Resuming on relate uses wrong index name

Closed this issue 5 years ago · 49 comments

Hi @rwynn ,
Thanks for your work, really amazing.

When Resuming using the following config, events in the change stream gets indexed to a wrong index name
{"index":{"_index":"5c976492cf09a2266a829a76_db.students","_id":"5c9e4143f37e264e131c84be","

version:4.16.0

resume=true
[[relate]]
	namespace="db.students"
	with-namespace="db.studentsView"
	src-field="refId"
	match-field="_id"

[[mapping]]
  namespace="db.studentsView"
  index="studentsView"
  type="studentsView"

This only happens on resumng, works fine when listening to new events in the change stream

Thanks!

jodevsa commented 5 years ago

Awesome

Answer 1 · 2019-03-29T16:35:37.000Z

Hi @jodevsa

Thanks. Do you have any other configuration specified (excluding the connection and auth info)?. If you have direct-read-namespaces are they targeting the view instead of the collection?

Answer 2 · 2019-03-29T16:39:40.000Z

I'm not even sure how it's possible the index could start with (an id?) appended with an underscore like this 5c976492cf09a2266a829a76_db.students. I don't believe there is any code that I'm aware of in monstache that would format the index like this. Are you using any plugins?

Answer 3 · 2019-03-29T16:42:54.000Z

Is it possible that there is some erroneous data in the oplog (possibly from a previous bad insert) such that the ns field in the oplog is set to 5c976492cf09a2266a829a76_db.students?

Answer 4 · 2019-03-29T16:49:38.000Z

If so, maybe this would help:

namespace-regex = '^db\.students'

Answer 5 · 2019-03-29T16:50:21.000Z

here is the rest of the config, direct-read-namespaces are commented,

# additional settings

# compress requests to Elasticsearch
gzip = true
# generate indexing statistics
stats = true
# index statistics into Elasticsearch
index-stats = false
# use the following PEM file for connections to MongoDB

# disable PEM validation
mongo-validate-pem-file = false
# use 4 go routines concurrently pushing documents to Elasticsearch
#elasticsearch-max-bytes = 400000
elasticsearch-max-conns = 2
# validate connections to Elasticsearch
elastic-validate-pem-file = true
# propogate dropped collections in MongoDB as index deletes in Elasticsearch
dropped-collections = true
# propogate dropped databases in MongoDB as index deletes in Elasticsearch
dropped-databases = true
# do not start processing at the beginning of the MongoDB oplog
# if you set the replay to true you may see version conflict messages
# in the log if you had synced previously. This just means that you are replaying old docs which are already
# in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones.
replay = false
# resume processing from a timestamp saved in a previous run
resume = true
# do not validate that progress timestamps have been saved
# override the name under which resume state is saved
resume-name = "default"
# exclude documents whose namespace matches the following pattern
# turn on indexing of GridFS file content
index-files = false
# turn on search result highlighting of GridFS content
file-highlighting = false
# print detailed information including request traces
verbose = true

# enable clustering mode
#cluster-name = 'apollo'
# do not exit after full-sync, rather continue tailing the oplog
exit-after-direct-reads = false

direct-read-namespaces is commented and yeah i searched the monstache code trying to figure out what happened nothing adds an ID, i've also checked the related collections i don't have any document with that ID

Answer 6 · 2019-03-29T16:51:40.000Z

tried to reproduce several times, looks like the prefix ID is always "5c976492cf09a2266a829a76"

Answer 7 · 2019-03-29T16:55:33.000Z

It might have been a mistake that got deleted later but was recorded in the oplog. Anyway it seems to saying that (maybe at one point) you have/had a db named 5c976492cf09a2266a829a76_db in MongoDB. It would have been recorded in local.oplog.rs as a record with an ns field = 5c976492cf09a2266a829a76_db.students.

Answer 8 · 2019-03-29T16:56:26.000Z

It should be ignored if you explicitly set the allowed namespaces

namespace-regex = '^db\.students'

Answer 9 · 2019-03-29T17:04:48.000Z

i'm intrested in more than one collection so i change the regex to
namespace-regex= '^db'

Answer 10 · 2019-03-29T17:05:23.000Z

Now resumes stopped working, live change streams updates still working fine

Answer 11 · 2019-03-29T17:08:09.000Z

Hmm. Maybe try

namespace-regex= '^db.+'

Answer 12 · 2019-03-29T17:10:00.000Z

Same tried both
namespace-regex= '^db.+'
and
namespace-regex = '^db.students'

Answer 13 · 2019-03-29T17:12:43.000Z

Actually tested MatchString method in go and you should only need the ^db.

Answer 14 · 2019-03-29T17:13:40.000Z

{ _id:
   { _data:
      '825C9E51CE0000000429295A1004E037522509E94139BEA8D7316E78898D46645F696400645C9E48F9F37E264E131C84F10004' },
  operationType: 'update',
  clusterTime:
   Timestamp { _bsontype: 'Timestamp', low_: 4, high_: 1553879502 },
  ns: { db: 'db', coll: 'student' },
  documentKey: { _id: 5c9e48f9f37e264e131c84f1 },
  updateDescription:
   { updatedFields:
      { description: '<p>123<br/></p>',
        updatedAt: 2019-03-29T17:11:42.892Z },
     removedFields: [] } }

Tried to listen to the change stream, this is the output

Answer 15 · 2019-03-29T17:14:57.000Z

Monstache gets the timestamp to resume from in monstache.monstache collection by default. There should be a doc from a previous run in there with an _id of your resume-name config.

Answer 16 · 2019-03-29T17:15:42.000Z

{
    "_id" : "default",
    "ts" : Timestamp(1553879502, 4)
}

?

Answer 17 · 2019-03-29T17:17:15.000Z

Updates and resumes are working fine, the problem is the index is occuring on the real collection not the view and the indexName on elastic is not right

Answer 18 · 2019-03-29T17:17:17.000Z

If you are on MongoDB 4.0+ you can enable using the new change streams API with

change-stream-namespaces = [ "db.students" ]

Otherwise Monstache defaults to the old method of tailing the oplog (before the api became available).

Answer 19 · 2019-03-29T17:17:48.000Z

If you want to listen to the whole database use

change-stream-namespaces = [ "db" ]

Answer 20 · 2019-03-29T17:17:56.000Z

Sorry forgot to add that line too, i'm using change-stream-namespaces

Answer 21 · 2019-03-29T17:19:58.000Z

Other thing you can do to work around is add another mapping. I'm not entirely sure why it's need though.

[[relate]]
	namespace="db.students"
	with-namespace="db.studentsView"
	src-field="refId"
	match-field="_id"
        keep-src=false
[[mapping]]
  namespace="db.students"
  index="studentsView"
  type="studentsView"

Answer 22 · 2019-03-29T17:21:13.000Z

actual change stream event

{ _id:
   { _data:
      '825C9E51CE0000000429295A1004E037522509E94139BEA8D7316E78898D46645F696400645C9E48F9F37E264E131C84F10004' },
  operationType: 'update',
  clusterTime:
   Timestamp { _bsontype: 'Timestamp', low_: 4, high_: 1553879502 },
  ns: { db: 'db', coll: 'student' },
  documentKey: { _id: 5c9e48f9f37e264e131c84f1 },
  updateDescription:
   { updatedFields:
      { description: '<p>123<br/></p>',
        updatedAt: 2019-03-29T17:11:42.892Z },
     removedFields: [] } }

monstache log
{"index":{"_index":"5c976492cf09a2266a829a76_db.studnet","_id":"5c9e48f9f37e264e131c84f1","_type":"_doc","version":6673861643014766596,"version_type":"external"}}

Answer 23 · 2019-03-29T17:29:12.000Z

@rwynn same issue

Answer 24 · 2019-03-29T17:31:23.000Z

I don't know where that prefix could be getting added. Here is the code that sets the namespace attribute.

https://github.com/rwynn/gtm/blob/development/gtm.go#L168

It should just be joining the 2 fields with a dot.

This later flows into Monstache but I don't see how it could be getting the prefix there.

Answer 25 · 2019-03-29T17:38:57.000Z

I just did a quick test on my end in a VM and couldn't get the prefix added with a resume set. You mentioned this was a shared instance of MongoDB? Could this be related to that? Maybe some sort of tenant namespacing?

Answer 26 · 2019-03-29T17:39:14.000Z

really weird, I'm using a mongo shared cluster on atlas

Answer 27 · 2019-03-29T17:39:52.000Z

But listening to the stream using a nodeJS script outputs the correct stream

ns: { db: 'db', coll: 'student' },

Answer 28 · 2019-03-29T17:42:33.000Z

Yeah, I'm not sure why it's working differently. But seems to be consistent with that same ID.

Answer 29 · 2019-03-29T17:45:11.000Z

Are you able to do a find on the underlying local.oplog.rs collection? You might not be able to in a shared cluster on altlas.

Answer 30 · 2019-03-29T17:59:39.000Z

@rwynn nope

Answer 31 · 2019-03-29T19:24:28.000Z

You could probably work around by adding another relate with the prefix added to the namespace.

Answer 32 · 2019-03-29T19:25:40.000Z

Looks like you can only access the oplog on an M10 or greater in Atlas.

Answer 33 · 2019-03-29T22:07:33.000Z

@jodevsa I created a ticket for this in the MongoDB Atlas support. Will report back if I learn anything.
Please let me know if you can recreate this outside of that environment.

Answer 34 · 2019-03-29T22:14:00.000Z

@jodevsa By the way, since you said this only triggers with resume on: in that case the change stream is started with the startAtOperationTime option set so that it can read previous entries. Maybe this bug only happens when that option is set?

https://docs.mongodb.com/manual/reference/method/db.collection.watch/

Answer 35 · 2019-03-30T00:49:13.000Z

@rwynn you are right ! when using the resumAfter option, the namespace changes to "5c976492cf09a2266a829a76_db'"

ns:
   { db: '5c976492cf09a2266a829a76_db', coll: 'student' },

Answer 36 · 2019-03-30T00:51:48.000Z

good news, this only occur on shared cluster!

Answer 37 · 2019-03-30T00:57:22.000Z

Would you mind writing up some instructions for repeated this with the mongo client? I have a ticket open now and they suspect it might be related to the shared tier cluster namespace. They asked for a code sample to repeat. Would be great to have a code sample that repeats with the mongo client to give them.

Answer 38 · 2019-03-30T01:04:08.000Z

@rwynn yeah sure

Answer 39 · 2019-03-30T01:35:57.000Z

// npm install mongodb
const MongoClient = require('mongodb').MongoClient;
const dbName = "db";
const url = `mongodb+srv://username:password@***********-i5mih.mongodb.net/${dbName}?retryWrites=true`;
const collectionName = "event";

const sleep = function(ms) {
    return new Promise((resolve, reject) => {
        setTimeout(() => {
            resolve(ms);
        }, ms)
    })
}
async function main() {
    let client = await MongoClient.connect(url, {
        useNewUrlParser: true
    })
    const db = client.db(dbName);
    const firstStream = await db.collection(collectionName).watch()
    let firstID
    firstStream.on('change', function(change) {
        console.log("first event ns", change.ns);
        firstID = change._id;
    });
    await sleep(1000)
    await db.collection(collectionName).insertOne({
        name: "first event "
    })
    await sleep(500);
    await firstStream.close();
    await db.collection(collectionName).insertOne({
        name: "second event"
    })
    const secondStream = await db.collection('event').watch({
        resumeAfter: firstID
    });
    secondStream.on('change', async function(change) {
        console.log("second event ns", change.ns);
        await secondStream.close();
        await client.close();
    });
}

main();

sample output:


first event ns { db: 'db', coll: 'event' }
second event ns { db: '5c976492cf09a2266a829a76_db', coll: 'event' }

Answer 40 · 2019-03-30T01:39:11.000Z

The following code reproduces the issue, it prints the ns of 2 events from 2 different change streams
the first stream doesn't have any options specified, the second one uses the 'resumeAt' taken from the first event

Answer 41 · 2019-03-30T02:23:47.000Z

Thanks that’s perfect. I’ll attach it to the ticket.

Great find on this issue!

Answer 42 · 2019-04-02T17:40:22.000Z

@jodevsa got an update today that a fix for this on the Atlas side is planned to land in the next couple weeks. They've been able to replicate with your script and it's on their radar.

Answer 43 · 2019-04-03T12:02:14.000Z

@rwynn
Thanks for the update, thats awesome, I really appreciate your time on this; knowing that it's not a bug/issue with the tool.

Answer 44 · 2019-06-09T08:27:45.000Z

@rwynn Atlas have fixed the issue in the latest maintenance update 4.010

Answer 45 · 2019-06-09T14:52:16.000Z

That’s good news. Thanks for the update.

Answer 46 · 2019-11-11T16:11:20.000Z

@jodevsa @rwynn has this issue resurfaced? I am testing a setup now with replay and resume on and in ES an index is created with an id. When doing an update in mongo after all previous documents were synced, then another index with the correct name is created for the newly updated docs.

What could be absurd is that I don't think I had this issue a few weeks ago when I ran a first test. I don't remember having seen an index with an prefixed id at that time.

I am running monstach in docker (image version 6.3.1) and I am testing with Mongo Atlas M0 (noticed they are at version 4.0.13 now) and ES Cloud 7.0.1.

All options are configured via commandline params given to the container.

Answer 47 · 2019-11-11T16:57:28.000Z

@sbeaupre you may want to try to replicate again using just the node code that @jodevsa previously posted to this issue. If you find that the issue has come back it would be helpful if you created a ticket with MongoDB.

For reference, the previous ticket that I created with them on March 29th was given case #00558022. You might state that is was fixed in 4.0.10 and seems to have come back now in 4.0.13.

Answer 48 · 2019-11-11T18:32:21.000Z

I can confirm that it is an issue (again). I'll contact mongo support with your reference number