PaddlePaddle/PGL

Distributed GraphSAGE with the distributed graph engine

l-hoang opened this issue · 1 comments

Hello.

Do you have any examples of GraphSAGE that use the distributed graph engine like
the meta2vec example which launches the server and uses a client to access the graph?

There is this example of distributed GraphSAGE that doesn't use the graph engine,
but instead loads the entire graph on each host and assigns each host its own
work (to my understanding).
https://github.com/PaddlePaddle/PGL/blob/main/examples/graphsage/cpu_sample_version/train_distributed_cpu.py

I want to run GraphSAGE where the graph is on the server (e.g., the graph is large)
where loading the entire graph into every machine may not be feasible.

If no such example exists, would I have to write my own distributed GraphClient
dataloader for the GraphSAGE GNN layer? Are there any examples that would
be simple to build upon? e.g., something similar to what DistDGL does here
with its DistNodeDataLoader.
https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/dist/train_dist.py

Thank you.