Performance issue with fields > 1 kilobyte
tammofreese opened this issue · 3 comments
When trying to switch from mongo to em-mongo, I think I discovered a performance issue when handling documents that contain field bigger than 1 kilobyte. The script below takes a size in bytes as argument, inserts 1000 documents with a data field that has the given size, and logs the time loading these documents takes with mongo and em-mongo.
Here are the results I get on my MacBook Air (Core 2 Duo 1.6 Ghz):
$ ruby em_mongo_vs_mongo.rb 1000
0.000: Requesting with mongo
0.094: mongo: returned 1000000 bytes
0.000: Requesting with em_mongo
0.191: em_mongo: returned 1000000 bytes
$ ruby em_mongo_vs_mongo.rb 10000
0.000: Requesting with mongo
0.154: mongo: returned 10000000 bytes
0.000: Requesting with em_mongo
2.568: em_mongo: returned 10000000 bytes
$ ruby em_mongo_vs_mongo.rb 100000
0.000: Requesting with mongo
0.887: mongo: returned 100000000 bytes
0.000: Requesting with em_mongo
32.195: em_mongo: returned 100000000 bytes
As you can see, em-mongo needs quite a lot of time when the documents get bigger. Is this a bug, or have I done something wrong in the code?
require 'mongo'
require 'em-mongo'
def log(message)
now = Time.now.to_f
@last ||= now
puts "%2.3f: #{message}" % (now - @last)
@last = now
end
EM.run do
mongo = Mongo::Connection.new.db("em_mongo_vs_mongo")
em_mongo = EM::Mongo::Connection.new.db("em_mongo_vs_mongo")
coll_mongo = mongo.collection('test_data')
coll_em_mongo = em_mongo.collection('test_data')
coll_mongo.remove()
(1..1000).each do |index|
coll_mongo.insert({id:index, data:(" " * ARGV[0].to_i)})
end
log("Requesting with mongo")
cursor = coll_mongo.find({}, :fields => ['id', 'data'])
count = 0
cursor.each do |doc|
if doc # unnecessary with mongo, but just to make the comparison fair
count += doc['data'].to_s.size
else
log("mongo: screwed up")
EM.next_tick do
EM.stop
end
end
end
log("mongo: returned #{count} bytes")
log("Requesting with em_mongo")
cursor = coll_em_mongo.find({}, :fields => ['id', 'data'])
count = 0
cursor.each do |doc|
if doc
count += doc['data'].to_s.size
else
log("em_mongo: returned #{count} bytes")
EM.next_tick do
EM.stop
end
end
end
end
I don't think benchmarking sync and async operations in the same event loop is a great idea, but I don't see why that would be throwing off the results here. My guess at first glance is this is probably a bug.
Found it! The problem is in connection.rb:
if @buffer.more?
remaining_bytes= @buffer.size-@buffer.position
@buffer = BSON::ByteBuffer.new(@buffer.to_s[@buffer.position,remaining_bytes])
@buffer.rewind
else
@buffer.clear
end
If there is more data available, the buffer is always recreated, even if the position in the buffer is 0 and the recreation is not needed. For bigger data, this problem gets worse by n*2. I just pushed a fix and will send you a pull request next.