elbywan/moongoon

Feature request: consider only updating touched fields

Closed this issue · 9 comments

I'm not sure how tenable this is, but my documents tend to be larger/more complex and update() currently sends the entire document over the wire to mongo... it would be great if when you set any of the properties the fact that that field was touched got recorded somehow and ONLY updated fields were pushed to mongo?

Please close this issue if you think it's not feasible or a bad idea... I can't think how to do it using propertys, and moving to something like a field macro instead with custom "dirty" logic would be a large breaking change. Something to mull over, thanks!

Hi @atlantis!

There is a way to achieve a similar behaviour already. When a Model that has optional fields is updated, by default moongoon will not send them over the wire and simply ignore the nil values.

This is a bit different from marking properties with a "touched" flag, but I think this is cleaner since we use the standard getter/setters instead of a custom macro and it prevents adding internal methods for each field.

The downside is that it won't work if your data model relies on explicitely setting nil values in the DB (and I'm thinking about the unset_nils config flag). In that regard it is similar with how the JSON module works.

Supporting both emitting nil values and ignoring untouched properties would be nice, but I need to reflect a bit about how to achieve that. One way would be to add the option at the bson.cr shard level and make the consumer choose how to ignore values based on a callback or something like that.

In the meantime, I made a small exemple below to illustrate how to perform a partial update with nil values:

require "moongoon"

# Logging requests.
Log.setup("*", :trace)

Moongoon.connect("mongodb://localhost:27017", database_name: "test_db")

class MyModel < Moongoon::Collection
  collection "models"

  # Optional properties
  property a : Int32?
  property b : Int32?
end

MyModel.clear

# Create the model
model = MyModel.new(a: 1, b: 2).insert
# Use another instance with only the values you want to update.
MyModel.new(a: 2).tap(&.id = model.id!).update

puts MyModel.find.to_a.to_pretty_json

Gives the following logs:

2021-02-12T09:55:54.243900Z   INFO - moongoon: Connecting to MongoDB @ mongodb://localhost:27017
2021-02-12T09:55:54.244027Z   INFO - moongoon: Using database test_db as default.
2021-02-12T09:55:54.248054Z  DEBUG - mongo: >> [4]      Mongo::Commands::Ping
2021-02-12T09:55:54.248058Z  TRACE - mongo: >> [4]      Header: Mongo::Messages::Header(@message_length=87, @request_id=4, @response_to=0, @op_code=Msg)
2021-02-12T09:55:54.248091Z  TRACE - mongo: >> [4]      Body: {"ping":1,"$db":"admin","lsid":{"id":{"$binary":{"base64":"1A97bJ9eQRaj29oEC4jvBA==","subType":"04"}}}}
2021-02-12T09:55:54.248213Z  DEBUG - mongo: << [4]      Header: Mongo::Messages::Header(@message_length=38, @request_id=77, @response_to=4, @op_code=Msg)
2021-02-12T09:55:54.248231Z  TRACE - mongo: << [4]      Body: {"ok":1.0}
2021-02-12T09:55:54.248262Z   INFO - moongoon: Connected to MongoDB.
2021-02-12T09:55:54.248264Z   INFO - moongoon: Processing database scripts…
2021-02-12T09:55:54.248358Z  DEBUG - mongo: >> [5]      Mongo::Commands::Delete
2021-02-12T09:55:54.248360Z  TRACE - mongo: >> [5]      Header: Mongo::Messages::Header(@message_length=135, @request_id=5, @response_to=0, @op_code=Msg)
2021-02-12T09:55:54.248376Z  TRACE - mongo: >> [5]      Body: {"delete":"models","$db":"test_db","lsid":{"id":{"$binary":{"base64":"1A97bJ9eQRaj29oEC4jvBA==","subType":"04"}}}}
2021-02-12T09:55:54.248384Z  TRACE - mongo: >> [5]      Seq(deletes): [{"q":{},"limit":0}]
2021-02-12T09:55:54.248622Z  DEBUG - mongo: << [5]      Header: Mongo::Messages::Header(@message_length=45, @request_id=78, @response_to=5, @op_code=Msg)
2021-02-12T09:55:54.248631Z  TRACE - mongo: << [5]      Body: {"n":1,"ok":1.0}
2021-02-12T09:55:54.248737Z  DEBUG - mongo: >> [6]      Mongo::Commands::Insert
2021-02-12T09:55:54.248741Z  TRACE - mongo: >> [6]      Header: Mongo::Messages::Header(@message_length=159, @request_id=6, @response_to=0, @op_code=Msg)
2021-02-12T09:55:54.248758Z  TRACE - mongo: >> [6]      Body: {"insert":"models","$db":"test_db","ordered":true,"lsid":{"id":{"$binary":{"base64":"1A97bJ9eQRaj29oEC4jvBA==","subType":"04"}}}}
2021-02-12T09:55:54.248769Z  TRACE - mongo: >> [6]      Seq(documents): [{"_id":{"$oid":"602650aa059dbe2f49c43755"},"a":1,"b":2}]
2021-02-12T09:55:54.248956Z  DEBUG - mongo: << [6]      Header: Mongo::Messages::Header(@message_length=45, @request_id=79, @response_to=6, @op_code=Msg)
2021-02-12T09:55:54.248964Z  TRACE - mongo: << [6]      Body: {"n":1,"ok":1.0}
2021-02-12T09:55:54.249080Z  DEBUG - mongo: >> [7]      Mongo::Commands::Update
2021-02-12T09:55:54.249083Z  TRACE - mongo: >> [7]      Header: Mongo::Messages::Header(@message_length=201, @request_id=7, @response_to=0, @op_code=Msg)
2021-02-12T09:55:54.249096Z  TRACE - mongo: >> [7]      Body: {"update":"models","$db":"test_db","lsid":{"id":{"$binary":{"base64":"1A97bJ9eQRaj29oEC4jvBA==","subType":"04"}}}}
2021-02-12T09:55:54.249113Z  TRACE - mongo: >> [7]      Seq(updates): [{"q":{"_id":{"$oid":"602650aa059dbe2f49c43755"}},"u":{"$set":{"_id":{"$oid":"602650aa059dbe2f49c43755"},"a":2}},"multi":true,"upsert":false}]
2021-02-12T09:55:54.249350Z  DEBUG - mongo: << [7]      Header: Mongo::Messages::Header(@message_length=60, @request_id=80, @response_to=7, @op_code=Msg)
2021-02-12T09:55:54.249362Z  TRACE - mongo: << [7]      Body: {"n":1,"nModified":1,"ok":1.0}
2021-02-12T09:55:54.249505Z  DEBUG - mongo: >> [8]      Mongo::Commands::Find
2021-02-12T09:55:54.249509Z  TRACE - mongo: >> [8]      Header: Mongo::Messages::Header(@message_length=150, @request_id=8, @response_to=0, @op_code=Msg)
2021-02-12T09:55:54.249530Z  TRACE - mongo: >> [8]      Body: {"find":"models","filter":{},"$db":"test_db","sort":{"_id":-1},"hint":{},"skip":0,"lsid":{"id":{"$binary":{"base64":"1A97bJ9eQRaj29oEC4jvBA==","subType":"04"}}}}
[
  {
    "_id": {
      "$oid": "602650aa059dbe2f49c43755"
    },
    "a": 2,
    "b": 2
  }
]
2021-02-12T09:55:54.249730Z  DEBUG - mongo: << [8]      Header: Mongo::Messages::Header(@message_length=142, @request_id=81, @response_to=8, @op_code=Msg)
2021-02-12T09:55:54.249751Z  TRACE - mongo: << [8]      Body: {"cursor":{"firstBatch":[{"_id":{"$oid":"602650aa059dbe2f49c43755"},"a":2,"b":2}],"id":0,"ns":"test_db.models"},"ok":1.0}

We can see that only the a property gets sent:

{"$set":{"_id":{"$oid":"602650aa059dbe2f49c43755"},"a":2}}

Thanks for the very detailed response! Yes, I'm dependent on being able to unset fields... but creating a blank model and then changing the id to an existing model in order to perform a partial update seems a bit convoluted anyway, so I might just revert to a raw mongo query where necessary rather than take that route :)

This is a tricky one... I'm not sure that Rails-like behavior (https://stackoverflow.com/questions/55586173/preventing-update-of-record-in-rails-app-where-no-fields-are-changed) where only updated fields are sent is possible by just using property, but I do agree that it's much cleaner to do so.

What about wrapping the custom mongo query I'm probably going to have to perform into the library itself? In my use-case, I need to update the status_at timestamp of a large document once a minute, and I don't want to re-write the full document (several kb) each minute.

So if there were a model.update_field(name, value) method similar to https://apidock.com/rails/ActiveRecord/Persistence/update_column that ONLY pushed that one field to mongo, I could use that?

So if there were a model.update_field(name, value) method similar to https://apidock.com/rails/ActiveRecord/Persistence/update_column that ONLY pushed that one field to mongo, I could use that?

That's a great idea 👍.

More generally, what do you think about adding a fields argument to the .update method so you could restrict which fields are $set / $unset? That would be an easy change to make.

Something like:

model = MyModel.new(a: 1, b: 2).insert
model.a = 2
model.update(fields: {:a})

Yes that sounds like a much better implementation than generating raw mongo queries... and if update() is also capable of enabling/disabling callbacks then we'd basically match ActiveRecord in functionality... it's a choose-your-own adventure of everything from the default behavior of "update all fields with callbacks" to "update just field X with no callbacks".

I do think an update_field convenience function on top of update would be handy too.

if update() is also capable of enabling/disabling callbacks then we'd basically match ActiveRecord in functionality

There is a no_hooks optional boolean argument that prevents the after/before callbacks already.

I do think an update_field convenience function on top of update would be handy too.

Allright, I'll make a PR with the changes 👍.

Thanks much!

@atlantis Just created PR #7.

I ended up adding an update_fields method instead of the update_field one because Crystal is not dynamic like Ruby and it turns out that it is quite complicated to set a class property with its name passed as a runtime argument.

Brilliant, thanks again!