richardwilly98/elasticsearch-river-mongodb

Version 1.7.1 doesn't support existing index.

hussachai opened this issue · 15 comments

I'm using ES 0.90.5
Mongo 2.4.6
Mongo-River: 1.7.1
OS: Mac 10.8.5

I've setup replicatset and made sure that there's no existing mapping in ES
by running this command

curl -XDELETE http://localhost:9200/_river/mongodb

Then I created the mapping by the following command

curl -XPUT "localhost:9200/_river/mongodb/_meta" -d '
{
  "type": "mongodb",
  "mongodb": { 
    "servers":
    [
      { "host": "localhost", "port": 27001 },
      { "host": "localhost", "port": 27002 },
      { "host": "localhost", "port": 27003 }
    ],
    "options": {
        "secondary_read_preference" : true
    },
    "db": "blog", 
    "collection": "test", 
    "gridfs": false
  }, 
  "index": { 
    "name": "blog", 
    "type": "test"
  }
}'

But it didn't work like I expected. I tried to insert new document into
the collection but it didn't copy to elasticsearch.

Here is the elasticsearch.log

[2013-11-06 12:08:26,206][INFO ][node                     ] [Typhon] version[0.90.5], pid[4213], build[c8714e8/2013-09-17T12:50:20Z]
[2013-11-06 12:08:26,206][INFO ][node                     ] [Typhon] initializing ...
[2013-11-06 12:08:26,254][INFO ][plugins                  ] [Typhon] loaded [mongodb-river, mapper-attachments], sites [river-mongodb]
[2013-11-06 12:08:28,390][INFO ][node                     ] [Typhon] initialized
[2013-11-06 12:08:28,390][INFO ][node                     ] [Typhon] starting ...
[2013-11-06 12:08:28,493][INFO ][transport                ] [Typhon] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.0.0.6:9300]}
[2013-11-06 12:08:31,522][INFO ][cluster.service          ] [Typhon] new_master [Typhon][MIkEAI9RRIS_PXAP3WlVZg][inet[/10.0.0.6:9300]], reason: zen-disco-join (elected_as_master)
[2013-11-06 12:08:31,545][INFO ][discovery                ] [Typhon] elasticsearch/MIkEAI9RRIS_PXAP3WlVZg
[2013-11-06 12:08:31,561][INFO ][http                     ] [Typhon] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.0.0.6:9200]}
[2013-11-06 12:08:31,561][INFO ][node                     ] [Typhon] started
[2013-11-06 12:08:32,171][INFO ][gateway                  ] [Typhon] recovered [8] indices into cluster_state
[2013-11-06 12:08:32,644][DEBUG][action.get               ] [Typhon] [_river][0]: failed to execute [[_river][mongodb][_meta]: routing [null]]
org.elasticsearch.indices.IndexMissingException: [_river] missing
    at org.elasticsearch.indices.InternalIndicesService.indexServiceSafe(InternalIndicesService.java:334)
    at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:96)
    at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:42)
    at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction$1.run(TransportShardSingleOperationAction.java:161)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
[2013-11-06 12:08:32,754][DEBUG][action.get               ] [Typhon] [_river][0]: failed to execute [[_river][mongodb][_meta]: routing [null]]
org.elasticsearch.index.shard.IllegalIndexShardStateException: [_river][0] CurrentState[RECOVERING] operations only allowed when started/relocated
    at org.elasticsearch.index.shard.service.InternalIndexShard.readAllowed(InternalIndexShard.java:704)
    at org.elasticsearch.index.shard.service.InternalIndexShard.get(InternalIndexShard.java:415)
    at org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.java:167)
    at org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:101)
    at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:103)
    at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:42)
    at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction$1.run(TransportShardSingleOperationAction.java:161)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
[2013-11-06 12:08:32,755][WARN ][river.routing            ] [Typhon] failed to get/parse _meta for [mongodb]
org.elasticsearch.index.shard.IllegalIndexShardStateException: [_river][0] CurrentState[RECOVERING] operations only allowed when started/relocated
    at org.elasticsearch.index.shard.service.InternalIndexShard.readAllowed(InternalIndexShard.java:704)
    at org.elasticsearch.index.shard.service.InternalIndexShard.get(InternalIndexShard.java:415)
    at org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.java:167)
    at org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:101)
    at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:103)
    at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:42)
    at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction$1.run(TransportShardSingleOperationAction.java:161)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
[2013-11-06 12:08:32,807][DEBUG][action.get               ] [Typhon] [_river][0]: failed to execute [[_river][mongodb][_meta]: routing [null]]
org.elasticsearch.index.shard.IllegalIndexShardStateException: [_river][0] CurrentState[RECOVERING] operations only allowed when started/relocated
    at org.elasticsearch.index.shard.service.InternalIndexShard.readAllowed(InternalIndexShard.java:704)
    at org.elasticsearch.index.shard.service.InternalIndexShard.get(InternalIndexShard.java:415)
    at org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.java:167)
    at org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:101)
    at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:103)
    at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:42)
    at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction$1.run(TransportShardSingleOperationAction.java:161)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
[2013-11-06 12:08:32,810][WARN ][river.routing            ] [Typhon] failed to get/parse _meta for [mongodb]
org.elasticsearch.index.shard.IllegalIndexShardStateException: [_river][0] CurrentState[RECOVERING] operations only allowed when started/relocated
    at org.elasticsearch.index.shard.service.InternalIndexShard.readAllowed(InternalIndexShard.java:704)
    at org.elasticsearch.index.shard.service.InternalIndexShard.get(InternalIndexShard.java:415)
    at org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.java:167)
    at org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:101)
    at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:103)
    at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:42)
    at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction$1.run(TransportShardSingleOperationAction.java:161)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
[2013-11-06 12:08:32,908][INFO ][river.mongodb            ] mongoServersSettings: [{port=27001, host=localhost}, {port=27002, host=localhost}, {port=27003, host=localhost}]
[2013-11-06 12:08:32,908][INFO ][river.mongodb            ] Server: localhost - 27001
[2013-11-06 12:08:32,910][INFO ][river.mongodb            ] Server: localhost - 27002
[2013-11-06 12:08:32,911][INFO ][river.mongodb            ] Server: localhost - 27003
[2013-11-06 12:08:32,930][TRACE][river.mongodb            ] mongoOptionsSettings: {secondary_read_preference=true}
[2013-11-06 12:08:32,931][DEBUG][org.elasticsearch.river.mongodb.MongoDBRiver] Prefix: [null] - name: [org.elasticsearch.river.mongodb.MongoDBRiver]
[2013-11-06 12:08:32,933][INFO ][river.mongodb            ] mongoServersSettings: [{port=27001, host=localhost}, {port=27002, host=localhost}, {port=27003, host=localhost}]
[2013-11-06 12:08:32,933][INFO ][river.mongodb            ] Server: localhost - 27001
[2013-11-06 12:08:32,933][INFO ][river.mongodb            ] Server: localhost - 27002
[2013-11-06 12:08:32,933][INFO ][river.mongodb            ] Server: localhost - 27003
[2013-11-06 12:08:32,934][TRACE][river.mongodb            ] mongoOptionsSettings: {secondary_read_preference=true}
[2013-11-06 12:08:32,936][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] Using mongodb server(s): host [localhost], port [27001]
[2013-11-06 12:08:32,936][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] Using mongodb server(s): host [localhost], port [27002]
[2013-11-06 12:08:32,936][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] Using mongodb server(s): host [localhost], port [27003]
[2013-11-06 12:08:32,939][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] MongoDB River Plugin version: [1.7.1]
[2013-11-06 12:08:32,939][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] starting mongodb stream. options: secondaryreadpreference [true], drop_collection [false], include_collection [], throttlesize [500], gridfs [false], filter [], db [blog], collection [test], script [null], indexing to [blog]/[test]
[2013-11-06 12:08:33,205][TRACE][org.elasticsearch.river.mongodb.MongoDBRiver] serverStatus: { "serverUsed" : "localhost/127.0.0.1:27001" , "host" : "Hussachais-MacBook-Pro.local:27001" , "version" : "2.4.6" , "process" : "mongod" , "pid" : 1889 , "uptime" : 3602.0 , "uptimeMillis" : 3601897 , "uptimeEstimate" : 3311.0 , "localTime" : { "$date" : "2013-11-06T18:08:33.197Z"} , "asserts" : { "regular" : 0 , "warning" : 0 , "msg" : 0 , "user" : 0 , "rollovers" : 0} , "backgroundFlushing" : { "flushes" : 60 , "total_ms" : 537 , "average_ms" : 8.95 , "last_ms" : 9 , "last_finished" : { "$date" : "2013-11-06T18:08:31.283Z"}} , "connections" : { "current" : 10 , "available" : 194 , "totalCreated" : 274} , "cursors" : { "totalOpen" : 2 , "clientCursors_size" : 2 , "timedOut" : 4} , "dur" : { "commits" : 29 , "journaledMB" : 0.0 , "writeToDataFilesMB" : 0.0 , "compression" : 0.0 , "commitsInWriteLock" : 0 , "earlyCommits" : 0 , "timeMs" : { "dt" : 3031 , "prepLogBuffer" : 0 , "writeToJournal" : 0 , "writeToDataFiles" : 0 , "remapPrivateView" : 0}} , "extra_info" : { "note" : "fields vary by platform" , "page_faults" : 71} , "globalLock" : { "totalTime" : 3601897000 , "lockTime" : 1174012 , "currentQueue" : { "total" : 0 , "readers" : 0 , "writers" : 0} , "activeClients" : { "total" : 0 , "readers" : 0 , "writers" : 0}} , "indexCounters" : { "accesses" : 5 , "hits" : 5 , "misses" : 0 , "resets" : 0 , "missRatio" : 0.0} , "locks" : { "." : { "timeLockedMicros" : { "R" : 191111 , "W" : 1174012} , "timeAcquiringMicros" : { "R" : 1042116 , "W" : 312149}} , "admin" : { "timeLockedMicros" : { } , "timeAcquiringMicros" : { }} , "local" : { "timeLockedMicros" : { "r" : 284728 , "w" : 5117} , "timeAcquiringMicros" : { "r" : 307085 , "w" : 1028}} , "blog" : { "timeLockedMicros" : { "r" : 5763 , "w" : 330097} , "timeAcquiringMicros" : { "r" : 279 , "w" : 22}}} , "network" : { "bytesIn" : 553107 , "bytesOut" : 863785 , "numRequests" : 5871} , "opcounters" : { "insert" : 4 , "query" : 153 , "update" : 9 , "delete" : 0 , "getmore" : 1372 , "command" : 4474} , "opcountersRepl" : { "insert" : 0 , "query" : 0 , "update" : 0 , "delete" : 0 , "getmore" : 0 , "command" : 0} , "recordStats" : { "accessesNotInMemory" : 0 , "pageFaultExceptionsThrown" : 0 , "blog" : { "accessesNotInMemory" : 0 , "pageFaultExceptionsThrown" : 0} , "local" : { "accessesNotInMemory" : 0 , "pageFaultExceptionsThrown" : 0}} , "repl" : { "setName" : "abc" , "ismaster" : true , "secondary" : false , "hosts" : [ "localhost:27001" , "localhost:27003" , "localhost:27002"] , "primary" : "localhost:27001" , "me" : "localhost:27001"} , "writeBacksQueued" : false , "mem" : { "bits" : 64 , "resident" : 9 , "virtual" : 2762 , "supported" : true , "mapped" : 128 , "mappedWithJournal" : 256} , "metrics" : { "document" : { "deleted" : 0 , "inserted" : 4 , "returned" : 71 , "updated" : 9} , "getLastError" : { "wtime" : { "num" : 3 , "totalMillis" : 0} , "wtimeouts" : 0} , "operation" : { "fastmod" : 7 , "idhack" : 0 , "scanAndOrder" : 0} , "queryExecutor" : { "scanned" : 174} , "record" : { "moves" : 0} , "repl" : { "apply" : { "batches" : { "num" : 0 , "totalMillis" : 0} , "ops" : 0} , "buffer" : { "count" : 0 , "maxSizeBytes" : 268435456 , "sizeBytes" : 0} , "network" : { "bytes" : 0 , "getmores" : { "num" : 0 , "totalMillis" : 0} , "ops" : 0 , "readersCreated" : 21} , "oplog" : { "insert" : { "num" : 4 , "totalMillis" : 0} , "insertBytes" : 463} , "preload" : { "docs" : { "num" : 0 , "totalMillis" : 0} , "indexes" : { "num" : 0 , "totalMillis" : 0}}} , "ttl" : { "deletedDocuments" : 0 , "passes" : 60}} , "ok" : 1.0}
[2013-11-06 12:08:33,205][TRACE][org.elasticsearch.river.mongodb.MongoDBRiver] process: mongod
[2013-11-06 12:08:33,240][DEBUG][river.mongodb.util       ] setRiverStatus called with INITIAL_IMPORT_FAILED

Thank you.

Can you please try:

  • to delete the index, delete the river
  • Stop ES
  • Restart ES
  • Add the river again

Please send the log if unsuccessful

It works now. Maybe, that's because I used existing index.
I tried again using the new index and it works.

Thanks a lot.

Good news. The initial import has been changed in release 1.7.1 see #47.
So the index / type must be empty and synchronization using oolong.rs should not have been executed yet.

I'm a new user and I just did it and I found that everything has been synced already after I set it up.
That's great. It's so convenient. Thank you.

Can this issue be closed?

Yes. However, I have one more question. Is it possible to setup river on existing index?

Having the same problem with existing index with predefined mapping for parent/child relationship. It simply wont index child documents at all. It says _riverstatus 1 INITIAL_IMPORT_FAILED, I was managed to get this working with 1.7.0 without problems, but 1.7.0 was indexing more documents that collection have :) Would be glad to get any explanation how to fix it.

there is also problem with closing mongoriver after deleting record from _river/ index
[2013-11-12 09:05:39,784][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] closing mongodb stream river
[2013-11-12 09:05:40,405][DEBUG][river.mongodb.util ] setRiverStatus called with RUNNING
[2013-11-12 09:05:40,416][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] About to stop river: vehicles_v1_0_7
[2013-11-12 09:05:40,417][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] closing mongodb stream river
[2013-11-12 09:05:41,417][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] About to stop river: vehicles_v1_0_7
[2013-11-12 09:05:41,418][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] closing mongodb stream river
[2013-11-12 09:05:42,420][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] About to stop river: vehicles_v1_0_7
[2013-11-12 09:05:42,420][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] closing mongodb stream river
[2013-11-12 09:05:43,425][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] About to stop river: vehicles_v1_0_7

It is looping like this until I restart elastic

I got that error too when I set it up on existing index.

@vglafirov
Please create a new issue regarding #167 (comment) ?

Initial import has been currently changed in 1.7.1 to use the collection data and it actually force you to use an index empty... That's probably not ideal in your scenario.

Should I rollback to 1.7.0 then?

That's probably the best for now. I will at this issue for the next release.

Great, thanks

Let me explain what's happening...

As I said the initial import has been changed in 1.7.1 (see #47) , it now uses the collection data speed up to the process then use oplog.rs collection.

  • Initial import means that oplog timestamp recorded in the river is null and initial timestamp in river definition has not been defined and the index is empty.

They are few issues here:

  • The condition index is empty should be more index / type is empty. That will be fixed in the new commit.
  • What to do when the index / type is not empty. In the current release the river will just stop and not index anything. That might too restrictive. The easiest way to handle this case would be to add an additional parameter in the river to skip the initial import using the collection data. What do you think? Other suggestion?

With the current release 1.7.1 you might be able to by-pass the issue using initial_timestamp parameter [1]

[1] - 57fc1c7

Thank you for explanation.
I changed the issue's title to make it more meaningful than saying "It does not work" in case someone
has the same problem can see this thread very useful.

👍
Having the parameter to skip the initial import is good. Actually, I don't know the better way.
The default value could be initial_import: true (or whatever sensible name)

Thanks a lot.