/jpotatoe

Simple JRuby script for one-way CouchDB 0.8.1 replication

Primary LanguageRuby

Jpotatoe: A simple JRuby script for one-way CouchDB 0.8.1 replication

This script is a modified version of potatoe.rb, an example Ruby script that
demonstrated basic CouchDB replication through the DbUpdateNotificationProcess
hook (couchdb.ini) setting. The original potatoe.rb script is found in this
blog post:

http://blog.idearise.com/2008/09/08/couchdb-081/

The original potatoe.rb script did not do batch updates or timed updates. The
Jpotatoe script does both and is meant to be used with JRuby.

However, I don’t think this script is a good way for real heavy CouchDB
replication. It hasn’t been tested with major loads, and there is no notion of
how it may behave. Furthermore, it just doesn’t seem right to do major
replication this way instead of doing it through a built-in CouchDB service or a
message queue.

So, why is this here? Hopefully, it will help anyone, who is thinking about doing
replication this way, to see what might be wrong with such an approach.
In any case, it might just be a waste of time. ;-)

Caveats and TODO

  • It is not known how JRuby startup times may affect any CouchDB update messages
    that are already being sent before the script process is ready to accept those
    messages.
  • CouchDB fires another “database update” message after a successful
    replication. This means that during its periodic time check the script may call
    for a subsequent replication even if there are no other database updates.
  • A CouchDB “database update” message doesn’t always mean a document was
    updated…See the previous point about CouchDB sending a database update
    message after a successful replication.
  • The script spawns a thread for each replication HTTP POST.
  • The thread locks are not on a “transaction” that includes the
    replication; they are on the minimum shared resource. This means that
    additional database updates may occur before replication has begun and will be
    end up as part of that replication.
  • The script doesn’t handle batch sizes or times for different databases.
  • CouchDB replication may be completely different in the future!

Example Usage

Batch size is the number of database updates that the script will wait for before
starting the replication process. A database update doesn’t always mean that a
document was updated!

The script will also start the replication process every X seconds to make sure
that the target database is updated every so often.

replicate_url = "http://192.168.0.4:5984/_replicate"
source = "http://192.168.0.4:5984/" # with trailing slash
target = "http://192.168.0.2:5984/" # with trailing slash
databases = { "mytestdb" => 0 } # { "name_in_quotes" => default_update_count }

config = { :replicate_url => replicate_url,
           :source => source,
           :target => target,
           :databases => databases,
           :batch_size => 50,
           :x_seconds => 1800 }
 
Jpotatoe.new(config).watch