Optionally disable merge box in Meteor for publish functions

Question

Optionally disable merge box in Meteor for publish functions

mitar opened this issue 9 years ago · 22 comments

_2 Upvotes_ This segment is a great example why sometimes it would be beneficial for some publish functions to disable merge box. So currently for publish functions there is a trade-off. Do you want to minimize data send over the wire (then you have to remember what each client has) or do you want to minimize data stored on the server (then you have to send the whole change every time and leave it to the client side to resolve the diff).

I think it would be great if this would be officially supported in Meteor. I think it can be easily done without much changes needed. Simply every added, changed, removed would be send as-is directly to the client or clients. No diffing, no computation, no memory storage.

I think this would better align with a broader Meteor ecosystem than current hack of using meteor-streams (which are sending two DDP messages, added and removed to get merge box to forget the content).

cc @rodrigok

rhyslbw commented 9 years ago

+1

Answer 1 · 2015-11-17T23:01:11.000Z

I think this sounds like a reasonable request but I would expect it to be difficult to unwind from the rest of the DDP stack easily. I think future changes to the data layer in Meteor will address the core issue however.

Answer 2 · 2015-11-24T21:23:28.000Z

I'd very much like to see this feature.

I'm dealing with a case where the published data structure is deeply nested and changing very often. Since merge box seems to operate on top level keys only, and in my case nearly always something is changing under every top level key... it does nothing. Except spike the CPU usage on the servers.

Answer 3 · 2015-12-01T23:48:26.000Z

I'm also very keen to see this. I was quite excited after it was mentioned in the meteor dev shop talk from Rocket.Chat, but there haven't been any concrete details thus far.

In my case, I'm publishing a large qty of documents that are often inserted/removed, but infrequently modified. Keeping a copy of every published document in memory seems like a huge waste.

Answer 4 · 2015-12-01T23:52:06.000Z

Rocket.Chat is using Meteor Streams for now to address this.

Answer 5 · 2015-12-08T21:34:09.000Z

I believe the workaround now is using the Streamy package and loosing some publish goodness. It is very simple and only sends data throught the same current socket connection.

Maybe a complementary Channels like data transfer pattern is better than hassling the current publish mentality, that the main purpose is sync client with the same data structure (Minimongo).

One question: if i publish to a local only collection, the doc is stored and processed by Mergebox too (everything sended with this.added...)??

Answer 6 · 2015-12-19T19:19:32.000Z

I made a package which provides this functionality: https://github.com/peerlibrary/meteor-control-mergebox

Answer 7 · 2015-12-20T06:32:48.000Z

Awesome @mitar! I look forward to testing it out.

Answer 8 · 2015-12-21T16:41:44.000Z

+1 to this feature request. Couldn't really agree more with

Do you want to minimize data send over the wire (then you have to remember what each client has) or do you want to minimize data stored on the server (then you have to send the whole change every time and leave it to the client side to resolve the diff).

Answer 9 · 2015-12-22T08:08:07.000Z

In fact I think there are three levels here:

we diff per top-level fields and send only those changes (current Meteor default)
we do deep diff and really send only changes (especially useful for arrays)
or we do not do any diffs, and always simply change the whole thing (what is this ticket about)

Each of those has its advantages and disadvantages. And not everything suits all use cases, so some configuration is needed.

My proposal would be to switch to JSON patch (#4483) instead of custom DDP, and then we can move between the first and the second as needed. And the third is now available through the package.

Answer 10 · 2015-12-22T08:49:14.000Z

Just a thought - What about having a capped list of recently published/modified documents per collection.

It can then be combined with an expiry system, to reduce memory usage of documents that are unmodified with a certain time frame.

That way we get the benefits of merge-box for frequently changing documents, but with a limit on memory usage.

Answer 11 · 2015-12-22T08:57:33.000Z

Yes! In fact, I observed two things for future optimizations. So one is the thing you are mentioning. So instead of having full cache in memory, you could have memory cache which expires (both time and size based). The issue here is that then for the client it is pretty tricky to reason about this. One solution is to send whole documents (and not only last changes) when merge box is disabled. Currently, when merge box is disabled, only last changes are send. You can use observe instead of observeChanges to send whole documents, but the problem is then observe caches everything. So one solution is to change also how observe caches: it should have time and size limits. And if document expires, then when a new change comes in, collection does findOne against the database and merges things together and then sends it through observe.

The second optimization is batching changes together. I observed similar to the client side optimization that server side is not batching anything together. So one could batch things together and send it just after some delay. This could help a lot with disabled merge box because instead of sending whole things over every time, you would at least combine changes together. It would also help normal operations as well.

This batching could probably be done simply with a package. Which would intercept all added, changed, removed calls and not propagate them to the real publish handle until some delay, merging things in meantime.

Answer 12 · 2016-10-27T20:00:07.000Z

+1 I think this is absolutely essential. There are a lot of cases where having a merge box makes no sense. Great discussion. It's been a while since the last post. Meteor is at 1.4.1.2 right now. For any readers here, are there any new packages regarding this we should know about? I was thinking of a package which had the same API as Meteor.publish (e.g. Meteor.fastPublish) which simply bypasses all of the caching/merge-box functionality and directly sends the DDP messages from calls to this.added/changed/removed

Answer 13 · 2016-10-27T20:04:39.000Z

Yes, you can just use: https://github.com/peerlibrary/meteor-control-mergebox

Answer 14 · 2016-10-27T20:22:39.000Z

Nice! Unless I'm wrong, it seems like merge box has literally no purpose if all of the defined subscriptions are orthogonal. Otherwise, even if overlappingData/totalData << 1, it would be preferable to just not use the merge box. I don't mind sending down a tiny bit of overlappingData if it means not storing multiple copies of totalData in server memory. I think those changes are idempotent on the client, anyways, so there doesn't seem to be any client-side work to be done even if the occasional dupe is sent down the wire.

Answer 15 · 2016-10-27T20:43:03.000Z

The question is only what you try to optimize. Data transferred, or memory on the server. Some prefer one, you seems to prefer the other. I think both are valid optimization points, but it should be configurable. The package above achieves this.

Answer 16 · 2016-10-28T00:30:50.000Z

Another way to bypass the merge box is to call this._session.sendAdded directly:

Meteor.publish('sub1', function(){
  const immediatelyAdd = (name,id,fields) => {
    if (this._isDeactivated()){
      return;
    }
    id = this._idFilter.idStringify(id);
    /*here would have been caching, diffing, etc.
      but we are skipping all of that and directly
      sending the data over the socket instead.*/
    this._session.sendAdded(name,id,fields)
  }
  Posts.find().observeChanges({
    added: (id,fields) => immediatelyAdd('posts',id,fields),
    ...
  })
  ...
})

I can't immediately see a problem with doing this. I don't know if your package does exactly this, or if it takes a slightly different approach.

Regarding the "multiple overlapping subscriptions" problem, this could be solved by using this.added/changed/removed and requiring that pubs : subs => clientCollections is one-to-one:

Meteor.publishWithoutMerge("list", function(){
  //populates "posts" collection.
  return Posts.find();
})
Meteor.publishWithoutMerge("detail", function(_id){
  /*populates "postDetails" collection, but would have
    populated the "posts" collection by default. If so, then
    we would have 2 subs => 1 collection, violating the rule above.*/
  Posts.find({_id}).observeChanges({
    added:(id, fields) => this.added("postDetails", id, fields),
    ...
  })
  ...
})

These subscriptions are not necessarily independent, since a post can be in the list and detail subs at the same time. If we do make sure that each sub has its own client-collection, then calling this.removed("postDetails",id) when the user navigates away from the detail page should only remove that particular post from the "postDetails" collection. If that same post was subscribed to via the list-view subscription, the list-view information for that post will still be in-tact since it lives in the "posts" collection; a separate collection.

Regarding changes... if subscriptions exhibit field-level independence, then it's impossible for a change to be sent down twice. To scale the meteor app, one could selectively disable the merge box using something like @mitar's package or the above function, carefully design subscriptions to minimize overlap, and finally ensure that each subscription owns its own client-side collection (even if they are observing the same data source).

Answer 17 · 2016-10-28T00:40:46.000Z

@Streemo: It feels to me you are duplicating work here. ;-)

Answer 18 · 2016-10-29T08:07:35.000Z

@mitar With your package, can we selectively decide when we want a publication to ignore the merge box, or does this.disableMergebox force the entire publication to ignore the merge box?

If I'm not mistaken, it seems like a publication can either have a merge box or not have a merge box. How does your package allow the developer to selectively choose when a call to this.added/changed/removed goes through the merge box? If it's possible, I think the documentation should make that much more clear. If it isn't possible with that package, maybe another package can be written to handle that, or your package can be updated.

Answer 19 · 2016-10-29T19:28:39.000Z

I think it is already hard to reason about semantics when merge box is disabeld. So I think such low level per-added call would make it even harder to reason.

Answer 20 · 2016-10-29T19:33:53.000Z

I take that as a "no"? That said, increasing control is definitely not duplicate work. Inevitably, a fraction of other people that stumble upon this rabbit hole will also need finer grained control. If you are reading this, then you could try and write specific wrapper functions around the low level DDP messages on a per publication basis, as I show above.

Then, you would just call the respective wrapper function instead of this.added/changed/removed, bypassing the merge box on a per-message basis. If you need Meteor to keep track of state for a small bit of your data (id only, etc.), you can simply call this.added/changed/removed on that particular data. You might want to use a watered down merge box if you have data which is added/removed but never changes, or if you know a priori what is going to change, you can prevent the merge box from seeing the static fields.

Answer 21 · 2017-06-09T19:10:15.000Z

To help provide a more clear separation between feature requests and bugs, and to help clean up the feature request backlog, Meteor feature requests are now being managed under the https://github.com/meteor/meteor-feature-requests repository.

Migrated to meteor/meteor-feature-requests#79.