bcg/em-mongo

Performance issue with fields > 1 kilobyte

tammofreese opened this issue · 3 comments

When trying to switch from mongo to em-mongo, I think I discovered a performance issue when handling documents that contain field bigger than 1 kilobyte. The script below takes a size in bytes as argument, inserts 1000 documents with a data field that has the given size, and logs the time loading these documents takes with mongo and em-mongo.

Here are the results I get on my MacBook Air (Core 2 Duo 1.6 Ghz):

$ ruby em_mongo_vs_mongo.rb 1000
0.000: Requesting with mongo
0.094: mongo: returned 1000000 bytes
0.000: Requesting with em_mongo
0.191: em_mongo: returned 1000000 bytes
$ ruby em_mongo_vs_mongo.rb 10000
0.000: Requesting with mongo
0.154: mongo: returned 10000000 bytes
0.000: Requesting with em_mongo
2.568: em_mongo: returned 10000000 bytes
$ ruby em_mongo_vs_mongo.rb 100000
0.000: Requesting with mongo
0.887: mongo: returned 100000000 bytes
0.000: Requesting with em_mongo
32.195: em_mongo: returned 100000000 bytes

As you can see, em-mongo needs quite a lot of time when the documents get bigger. Is this a bug, or have I done something wrong in the code?

require 'mongo'
require 'em-mongo'

def log(message)
  now = Time.now.to_f
  @last ||= now
  puts "%2.3f: #{message}" % (now - @last)
  @last = now
end

EM.run do
  mongo = Mongo::Connection.new.db("em_mongo_vs_mongo")
  em_mongo = EM::Mongo::Connection.new.db("em_mongo_vs_mongo")

  coll_mongo = mongo.collection('test_data')
  coll_em_mongo = em_mongo.collection('test_data')

  coll_mongo.remove()

  (1..1000).each do |index|
    coll_mongo.insert({id:index, data:(" " * ARGV[0].to_i)})
  end

  log("Requesting with mongo")
  cursor = coll_mongo.find({}, :fields => ['id', 'data'])
  count = 0
  cursor.each do |doc|
    if doc # unnecessary with mongo, but just to make the comparison fair
      count += doc['data'].to_s.size
    else
      log("mongo: screwed up")
      EM.next_tick do
        EM.stop
      end
    end
  end
  log("mongo: returned #{count} bytes")

  log("Requesting with em_mongo")
  cursor = coll_em_mongo.find({}, :fields => ['id', 'data'])
  count = 0
  cursor.each do |doc|
    if doc
      count += doc['data'].to_s.size
    else
      log("em_mongo: returned #{count} bytes")
      EM.next_tick do
        EM.stop
      end
    end
  end
end
bcg commented

I don't think benchmarking sync and async operations in the same event loop is a great idea, but I don't see why that would be throwing off the results here. My guess at first glance is this is probably a bug.

Found it! The problem is in connection.rb:

    if @buffer.more?
      remaining_bytes= @buffer.size-@buffer.position
      @buffer = BSON::ByteBuffer.new(@buffer.to_s[@buffer.position,remaining_bytes])
      @buffer.rewind
    else
      @buffer.clear
    end

If there is more data available, the buffer is always recreated, even if the position in the buffer is 0 and the recreation is not needed. For bigger data, this problem gets worse by n*2. I just pushed a fix and will send you a pull request next.

Done with #42