richardwilly98/elasticsearch-river-mongodb

Delete index when corresponding collection is dropped

enrique-fernandez-polo opened this issue · 18 comments

We realize that when we drop a MongoDB collection through shell the index is not refreshed. We expect that if I remove a MongoDB collection from the shell then the index should be deleted.

I had similar concerns. But apparently I can setup up rivers from different mongo collections to the same index.
So removing a mongo collection shouldn't entail deleting an index....

The exact problem we are having is when we drop a collection, delete the index and create the same collection and the same index. The river reads the opslog collection from MongoDB and index unexisting documents that were inserted before the collection drop.

EDIT: Well this is related with this #47

Hi,

I have posted the question to MongoDB user group [1].

[1] - https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/-zj9ojkfN_g

Thanks,
Richard.

I've made a fast fork with little changes for getting notified of the drop collection event. I had to get rid of the id field because it's not used in oplog.rs when reporting a drop collection event.

https://github.com/Quaiks/elasticsearch-river-mongodb-drop-collection-event

Hi,

Thanks I will merge the changes.

Thanks,
Richard.

On Fri, May 17, 2013 at 10:43 AM, Enrique Fernández-Polo Puertas <
notifications@github.com> wrote:

I've made a fast fork with little changes for getting notified of the drop
collection event. I had to get rid of the id field because it's not used in
oplog.rs when reporting a drop collection event.

https://github.com/Quaiks/elasticsearch-river-mongodb-drop-collection-event


Reply to this email directly or view it on GitHubhttps://github.com//issues/79#issuecomment-18065513
.

Works for me. es-river-mongodb v1.6.9 mongo v2.4.4

  #!/bin/sh
  curl -XPUT "http://localhost:9200/_river/product-$type/_meta" -d '
  {
    "type": "mongodb",
    "mongodb": {
      "servers": [
        { "host": "127.0.0.1", "port": 27017 }
      ],
      "options": { "secondary_read_preference": true, "drop_collection": true },
      "db": "product",
      "collection": "'$type'"
    },
    "index": {
      "name": "product",
      "type": "'$type'"
    }
  }'

Once I've dropped the database, then I re-create the collection, new records don't seem to sync any more.
If I remove the "drop_collection": true setting then all the old deleted records come back...

Is the patch starting sync operations again once the collection is re-created?

The river always uses oplog.rs collection.
So I believe when you drop the collection Mongo does not delete the associated entries in oplog.rs but if you drop the database Mongo also drop the associated entries in oplog.rs

Can you please just clarify your scenario before I take a look?

Thanks,
Richard.

Sure, I've noticed another issue with this feature too. Maybe I'm using it wrong, let me explain.

Scenario 1

  • Created a mongo db
  • Created a collection
  • Add records to collection
  • Drop database
  • Recreate mapping
  • Recreate river WITHOUT "drop_collection": true

Expected: Zero documents indexed in elastic
Actual: Many documents indexed in elastic

Scenario 2

  • Created a mongo db
  • Created a collection
  • Add records to collection
  • Drop database
  • Recreate mapping
  • Recreate river WITH "drop_collection": true

Expected: Zero documents indexed in elastic
Actual: Zero documents indexed in elastic

  • Add record to collection

Expected: A single document indexed in elastic
Actual: Zero documents indexed in elastic

Scenario 3

  • Created a mongo db
  • Created a collection
  • Add records to collection
  • Drop database
  • Recreate mapping
  • Recreate river WITH "drop_collection": true

Expected: Mapping remains as set when index was created
Actual: Mapping is replaced with a default mapping

This feature only works when dropping collections and no databases.

Hi,

@Quaiks is correct the river only monitor "drop collection" action from oplog.rs.

Thanks,
Richard.

Cool, thanks I'll test that out.
What about scenario #3 resetting my mapping?

I am still having issues with scenario #2, when I use "drop_collection": true then no documents are synced via the river. Can anyone reproduce this or is it just me?

Note: I am not dropping the db, just the collection with:

mongo> use foo
mongo> db.bar.drop()

Hi,

I believe you found a bug. I have committed a fix.
I hope to be able to provide a new release this week.

Thanks,
Richard.

Fix is available in release 1.6.11.

Thanks,
Richard.

Sorry for the delay, have been enjoying the British heatwave in the Lake District.
Thanks for the bugfix @richardwilly98 I will put it to the test and report back.

Great, upgraded to 1.6.11 and collection.drop() seems to work great now.