Bug: Resuming on relate uses wrong index name
Closed this issue · 49 comments
Hi @rwynn ,
Thanks for your work, really amazing.
When Resuming using the following config, events in the change stream gets indexed to a wrong index name
{"index":{"_index":"5c976492cf09a2266a829a76_db.students","_id":"5c9e4143f37e264e131c84be","
version:4.16.0
resume=true
[[relate]]
namespace="db.students"
with-namespace="db.studentsView"
src-field="refId"
match-field="_id"
[[mapping]]
namespace="db.studentsView"
index="studentsView"
type="studentsView"
This only happens on resumng, works fine when listening to new events in the change stream
Thanks!
Hi @jodevsa
Thanks. Do you have any other configuration specified (excluding the connection and auth info)?. If you have direct-read-namespaces
are they targeting the view instead of the collection?
I'm not even sure how it's possible the index could start with (an id?) appended with an underscore like this 5c976492cf09a2266a829a76_db.students
. I don't believe there is any code that I'm aware of in monstache that would format the index like this. Are you using any plugins?
Is it possible that there is some erroneous data in the oplog (possibly from a previous bad insert) such that the ns
field in the oplog is set to 5c976492cf09a2266a829a76_db.students
?
If so, maybe this would help:
namespace-regex = '^db\.students'
here is the rest of the config, direct-read-namespaces are commented,
# additional settings
# compress requests to Elasticsearch
gzip = true
# generate indexing statistics
stats = true
# index statistics into Elasticsearch
index-stats = false
# use the following PEM file for connections to MongoDB
# disable PEM validation
mongo-validate-pem-file = false
# use 4 go routines concurrently pushing documents to Elasticsearch
#elasticsearch-max-bytes = 400000
elasticsearch-max-conns = 2
# validate connections to Elasticsearch
elastic-validate-pem-file = true
# propogate dropped collections in MongoDB as index deletes in Elasticsearch
dropped-collections = true
# propogate dropped databases in MongoDB as index deletes in Elasticsearch
dropped-databases = true
# do not start processing at the beginning of the MongoDB oplog
# if you set the replay to true you may see version conflict messages
# in the log if you had synced previously. This just means that you are replaying old docs which are already
# in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones.
replay = false
# resume processing from a timestamp saved in a previous run
resume = true
# do not validate that progress timestamps have been saved
# override the name under which resume state is saved
resume-name = "default"
# exclude documents whose namespace matches the following pattern
# turn on indexing of GridFS file content
index-files = false
# turn on search result highlighting of GridFS content
file-highlighting = false
# print detailed information including request traces
verbose = true
# enable clustering mode
#cluster-name = 'apollo'
# do not exit after full-sync, rather continue tailing the oplog
exit-after-direct-reads = false
direct-read-namespaces is commented and yeah i searched the monstache code trying to figure out what happened nothing adds an ID, i've also checked the related collections i don't have any document with that ID
tried to reproduce several times, looks like the prefix ID is always "5c976492cf09a2266a829a76"
It might have been a mistake that got deleted later but was recorded in the oplog. Anyway it seems to saying that (maybe at one point) you have/had a db named 5c976492cf09a2266a829a76_db
in MongoDB. It would have been recorded in local.oplog.rs
as a record with an ns
field = 5c976492cf09a2266a829a76_db.students
.
It should be ignored if you explicitly set the allowed namespaces
namespace-regex = '^db\.students'
i'm intrested in more than one collection so i change the regex to
namespace-regex= '^db'
Now resumes stopped working, live change streams updates still working fine
Hmm. Maybe try
namespace-regex= '^db.+'
Same tried both
namespace-regex= '^db.+'
and
namespace-regex = '^db.students'
Actually tested MatchString
method in go and you should only need the ^db
.
{ _id:
{ _data:
'825C9E51CE0000000429295A1004E037522509E94139BEA8D7316E78898D46645F696400645C9E48F9F37E264E131C84F10004' },
operationType: 'update',
clusterTime:
Timestamp { _bsontype: 'Timestamp', low_: 4, high_: 1553879502 },
ns: { db: 'db', coll: 'student' },
documentKey: { _id: 5c9e48f9f37e264e131c84f1 },
updateDescription:
{ updatedFields:
{ description: '<p>123<br/></p>',
updatedAt: 2019-03-29T17:11:42.892Z },
removedFields: [] } }
Tried to listen to the change stream, this is the output
Monstache gets the timestamp to resume from in monstache.monstache
collection by default. There should be a doc from a previous run in there with an _id
of your resume-name
config.
{
"_id" : "default",
"ts" : Timestamp(1553879502, 4)
}
?
Updates and resumes are working fine, the problem is the index is occuring on the real collection not the view and the indexName on elastic is not right
If you are on MongoDB 4.0+ you can enable using the new change streams API with
change-stream-namespaces = [ "db.students" ]
Otherwise Monstache defaults to the old method of tailing the oplog (before the api became available).
If you want to listen to the whole database use
change-stream-namespaces = [ "db" ]
Sorry forgot to add that line too, i'm using change-stream-namespaces
Other thing you can do to work around is add another mapping. I'm not entirely sure why it's need though.
[[relate]]
namespace="db.students"
with-namespace="db.studentsView"
src-field="refId"
match-field="_id"
keep-src=false
[[mapping]]
namespace="db.students"
index="studentsView"
type="studentsView"
actual change stream event
{ _id:
{ _data:
'825C9E51CE0000000429295A1004E037522509E94139BEA8D7316E78898D46645F696400645C9E48F9F37E264E131C84F10004' },
operationType: 'update',
clusterTime:
Timestamp { _bsontype: 'Timestamp', low_: 4, high_: 1553879502 },
ns: { db: 'db', coll: 'student' },
documentKey: { _id: 5c9e48f9f37e264e131c84f1 },
updateDescription:
{ updatedFields:
{ description: '<p>123<br/></p>',
updatedAt: 2019-03-29T17:11:42.892Z },
removedFields: [] } }
monstache log
{"index":{"_index":"5c976492cf09a2266a829a76_db.studnet","_id":"5c9e48f9f37e264e131c84f1","_type":"_doc","version":6673861643014766596,"version_type":"external"}}
I don't know where that prefix could be getting added. Here is the code that sets the namespace attribute.
https://github.com/rwynn/gtm/blob/development/gtm.go#L168
It should just be joining the 2 fields with a dot.
This later flows into Monstache but I don't see how it could be getting the prefix there.
I just did a quick test on my end in a VM and couldn't get the prefix added with a resume set. You mentioned this was a shared instance of MongoDB? Could this be related to that? Maybe some sort of tenant namespacing?
really weird, I'm using a mongo shared cluster on atlas
But listening to the stream using a nodeJS script outputs the correct stream
ns: { db: 'db', coll: 'student' },
Yeah, I'm not sure why it's working differently. But seems to be consistent with that same ID.
Are you able to do a find on the underlying local.oplog.rs
collection? You might not be able to in a shared cluster on altlas.
You could probably work around by adding another relate with the prefix added to the namespace.
Looks like you can only access the oplog on an M10 or greater in Atlas.
@jodevsa I created a ticket for this in the MongoDB Atlas support. Will report back if I learn anything.
Please let me know if you can recreate this outside of that environment.
@jodevsa By the way, since you said this only triggers with resume on: in that case the change stream is started with the startAtOperationTime
option set so that it can read previous entries. Maybe this bug only happens when that option is set?
https://docs.mongodb.com/manual/reference/method/db.collection.watch/
@rwynn you are right ! when using the resumAfter option, the namespace changes to "5c976492cf09a2266a829a76_db'"
ns:
{ db: '5c976492cf09a2266a829a76_db', coll: 'student' },
good news, this only occur on shared cluster!
Would you mind writing up some instructions for repeated this with the mongo client? I have a ticket open now and they suspect it might be related to the shared tier cluster namespace
. They asked for a code sample to repeat. Would be great to have a code sample that repeats with the mongo client to give them.
// npm install mongodb
const MongoClient = require('mongodb').MongoClient;
const dbName = "db";
const url = `mongodb+srv://username:password@***********-i5mih.mongodb.net/${dbName}?retryWrites=true`;
const collectionName = "event";
const sleep = function(ms) {
return new Promise((resolve, reject) => {
setTimeout(() => {
resolve(ms);
}, ms)
})
}
async function main() {
let client = await MongoClient.connect(url, {
useNewUrlParser: true
})
const db = client.db(dbName);
const firstStream = await db.collection(collectionName).watch()
let firstID
firstStream.on('change', function(change) {
console.log("first event ns", change.ns);
firstID = change._id;
});
await sleep(1000)
await db.collection(collectionName).insertOne({
name: "first event "
})
await sleep(500);
await firstStream.close();
await db.collection(collectionName).insertOne({
name: "second event"
})
const secondStream = await db.collection('event').watch({
resumeAfter: firstID
});
secondStream.on('change', async function(change) {
console.log("second event ns", change.ns);
await secondStream.close();
await client.close();
});
}
main();
sample output:
first event ns { db: 'db', coll: 'event' }
second event ns { db: '5c976492cf09a2266a829a76_db', coll: 'event' }
The following code reproduces the issue, it prints the ns of 2 events from 2 different change streams
the first stream doesn't have any options specified, the second one uses the 'resumeAt' taken from the first event
Thanks that’s perfect. I’ll attach it to the ticket.
Great find on this issue!
Awesome
@jodevsa got an update today that a fix for this on the Atlas side is planned to land in the next couple weeks. They've been able to replicate with your script and it's on their radar.
@rwynn
Thanks for the update, thats awesome, I really appreciate your time on this; knowing that it's not a bug/issue with the tool.
That’s good news. Thanks for the update.
@jodevsa @rwynn has this issue resurfaced? I am testing a setup now with replay and resume on and in ES an index is created with an id. When doing an update in mongo after all previous documents were synced, then another index with the correct name is created for the newly updated docs.
What could be absurd is that I don't think I had this issue a few weeks ago when I ran a first test. I don't remember having seen an index with an prefixed id at that time.
I am running monstach in docker (image version 6.3.1) and I am testing with Mongo Atlas M0 (noticed they are at version 4.0.13 now) and ES Cloud 7.0.1.
All options are configured via commandline params given to the container.
@sbeaupre you may want to try to replicate again using just the node code that @jodevsa previously posted to this issue. If you find that the issue has come back it would be helpful if you created a ticket with MongoDB.
For reference, the previous ticket that I created with them on March 29th was given case #00558022. You might state that is was fixed in 4.0.10 and seems to have come back now in 4.0.13.
I can confirm that it is an issue (again). I'll contact mongo support with your reference number