[graph-node] Multiple Graph Nodes

Question

[graph-node] Multiple Graph Nodes

Closed this issue 10 months ago · 1 comments

Hi folks.
We have been running one instance graph-node without having dedicated nodes, i.e. all components(indexer, query, and ingestor in one node). To realize HA, we are gonna run multiple instances of the current graph-node following this doc https://thegraph.com/docs/en/operating-graph-node/#multiple-graph-nodes.
It is said that we have to deploy only one block ingestor to avoid multiple polloing the chain head. But i can see that this helm chart allows multiple replicas configurable for ingestor node group also.
So my questions are

Is it ok to run multiple ingestors?
Without differentiating instance roles like ingestor, index, query node, can we just replicate instances each of which includes all components?
If we deploy instances per a node group, Which group should be mostly replicated?
other recommendations for graph node HA
Thanks.

Answer 1 · 2023-11-01T18:03:03.000Z

sorry for the delay in answering this one @josedev-union

You really only want one ingestor, as even if it would be ok to run multiple nodes ingesting to the same database that would just add overhead for no benefit. One ingestor only, but you can set those per chain (different nodes being the ingestor for different chains). If you want HA in ingesting, you should look into having parallel stacks with independent databases, each running a single ingestor per chain.
You can have all your nodes be both indexing and serving queries, and you can replicate those as much as you want. The ingestor as covered before needs to be just one. Separating the indexing / queries roles is a matter of optimization, scaling, use-case and there is no single right answer.
Here really depends: are you concerned with scaling and latency, in a scenario where you are interested in indexing just a bit of data (say, a single subgraph) but serving a very big load of queries to it? are you most concerned with HA at indexing and always having fresh data (up to chainhead, no stale data that is many blocks behind), etc. No single answer to this one :) But typically indexing is the most resource-demanding of those tasks especially if you'll be indexing multiple subgraphs.. so it's easier to be forced into having to scale indexing because of resources. Querying, unless your use-case does do an awful lot of queries, you'll probably be scaling it more for a question of HA than because you are exhausting a single node's resources serving queries. You will also be looking into how to achieve HA at the database level, and scaling it for performance (sharding), which the stack supports well.
Take a look at the proxyd chart as well, for an RPC proxy. I assume you'll also be running archive nodes, as indexing off commercial services becomes quite expensive quite quickly, and that would be a valuable tool in terms of having HA on your RPCs. Stay in touch and if you haven't done so, consider joining us at Discord where there is a whole community that will be glad to answer any questions like that and more!

Again, sorry for such a late reply :)