convoyinc/apollo-cache-hermes

Clarification needed on docs

nareshbhatia opened this issue · 4 comments

The intent of the following example under Design Exploration is not clear to me:

{
  ROOT: {
    posts: [
      {}, // Reference to <1>
      {}, // Reference to <2>
    ],
  },
  1: {
    id: 1,
    title: "GraphQL Rocks!",
    author: {} // Reference to <3>
  },
  2: {
    id: 2,
    title: "Caching Is Hard",
    author: {} // Reference to <3>
  },
  3: {
    id: 3,
    name: 'Gouda',
  },
}

This example looks exactly like the one under Flattening (& Normalization) in Motivation. It's not clear to me what distinction is being made. How does the example illustrate the difference between the Hermes Cache and the Apollo Cache?

Also, is there a distinction being made between the terms Flattening and Normalization? My current interpretation is that to flatten an object graph, one would have to replace the embedded entities with references and the process of doing this is called normalization. But I may be missing some subtlety.

Finally, how do you detect entities during the normalization process? Is it by id only or by id and __typename. Can you please point to the code where this is done?

Thanks in advance.

nevir commented

This example looks exactly like the one under Flattening (& Normalization) in Motivation. It's not clear to me what distinction is being made. How does the example illustrate the difference between the Hermes Cache and the Apollo Cache?

Hm yeah - those docs could use some clarification (and probably different terminology). Lemme try rephrasing:

Apollo Client (and Relay), normalizes nodes in the graph by replacing field references with a bit of metadata that indicates which node that field points to (the { __ref } structure in the flattening example). Both Apollo & Relay then walk that metadata on read and replace it with data from the referenced nodes.

Hermes' main differentiator is that it maintains actual references to the underlying JS objects for those fields. In the example you quote above, 1.author and 2.author both reference the same object (3). It stores cached data in this format, so that reads are largely constant time (no metadata to walk & replace).

Also, is there a distinction being made between the terms Flattening and Normalization? My current interpretation is that to flatten an object graph, one would have to replace the embedded entities with references and the process of doing this is called normalization. But I may be missing some subtlety.

Yeah, in those docs I'm taking normalization in its more general sense: there's only one instance of each piece of data (not that it's adhering to db-style normal forms).

Under that interpretation, flattening is one type of normalization. However, hermes is also normalizing by maintaining those references directly (there's still only ever one copy of a given node in the cache)

Finally, how do you detect entities during the normalization process? Is it by id only or by id and __typename. Can you please point to the code where this is done?

It's configurable, similar to Apollo's cache, albeit a bit more coarse (intentionally). By default, any node with an id field is considered to be an entity in the graph (and thus available to be normalized). Feel free to swap that function with an implementation of your choice to tune it further.

Note this also lets you opt out of normalization where you might not need it (which can provide further performance boosts, as there's fewer pointers for the cache to maintain)

@nevir, thanks for the detailed explanation. Very clear now! To confirm my understanding, you take the tree representation of an object graph (with possibly duplicate entity instances) and convert it to a real object graph, with only one instance per entity. Is this correct?

Can I ask you for a favor? I use MobX as my state management library and would like to build a similar cache using MobX. Can you please point me to your internal data structures and also the algorithms you use to convert trees to graphs and vice versa? I would like to access the effort it would take to build a MobX-powered cache.

nevir commented

To confirm my understanding, you take the tree representation of an object graph (with possibly duplicate entity instances) and convert it to a real object graph, with only one instance per entity. Is this correct?

Yup! That's a much clearer way of describing it 👍

Can I ask you for a favor? I use MobX as my state management library and would like to build a similar cache using MobX. Can you please point me to your internal data structures and also the algorithms you use to convert trees to graphs and vice versa? I would like to access the effort it would take to build a MobX-powered cache.

https://github.com/convoyinc/apollo-cache-hermes/blob/master/src/operations/SnapshotEditor.ts is where the bulk of it is (the bookkeeping & management of immutable updates)

As for adapting the concept to MobX, that was one thing that we explored early on and weren't able to get to work very well (due to the pretty heavy overhead of MobX's synthesized properties, and challenge dealing with derived fields). It'd be super awesome to see the idea working with MobX, though!

Awesome, and thanks for the context on your MobX exploration. Will keep that in mind.