Model diagrams for the GNN examples

Question

Model diagrams for the GNN examples

code-rex1 opened this issue 3 years ago · 16 comments

❓ Questions and Help

This repo presents couple of nice examples for the GNN.

I am particularly interested about:

Do you have the model architecture described somewhere as part of the tutorial or documentation?
Alternately do you have a canonical architecture described somewhere for these Graph2Seq based models?
Is the model same as the Graph2Seq: A Generalized Seq2Seq Model for Graph Inputs?

Answer 1 · 2022-04-22T17:29:24.000Z

@AlanSwift @hugochan can you please help?

Answer 2 · 2022-04-22T17:48:44.000Z

Currently, we don't provide the architecture graph about the specific applications. But we have visualized specific graph types such as dependency and etc. in our survey paper.

Answer 3 · 2022-04-22T17:52:25.000Z

There are some differences.
We apply RNN or bert encoding before GNN to initialize the node embedding. And we use separate attention: 1. attention on node embedding, 2. attention on node initial embedding. This is just an example. For more details, please refer to our docs.

Answer 4 · 2022-04-22T18:00:32.000Z

@AlanSwift thanks for your response. But I can't find that much details in the document.

I see at first you generate initial node embedding using word2ve or BERT.

But your statement about the separate attention: 1. attention on node embedding, 2. attention on node initial embedding is not clear. Can you please elaborate? 🙏

Answer 5 · 2022-04-22T18:17:55.000Z

@AlanSwift also a bit confused here you said, we use separate attention:

attention on node embedding
attention on node initial embedding.

But the example for the NMT is with a GCN. GCN does not use attention. So lost here. Please elaborate so that I can understand the model a little better.

Thanks in advance for your help.

Answer 6 · 2022-04-22T18:23:53.000Z

Just an example:
encoder pipeline: RNNencoder --> GNN encoder
decoder pipeline: 1. attention on RNNencoder results 2. attention on GNN encoder results. 3. fuse them

Answer 7 · 2022-04-24T01:36:29.000Z

@AlanSwift this RNNencoder comes after the word2vec or bert embedding. The document states:

For instance, for single-token item, w2v_bilstm strategy means we first use word2vec embeddings to initialize each item, and then apply a BiLSTM encoder to encode the whole graph (assuming the node order reserves the sequential order in raw text).

I do not understand why/how the BiLSTM encoder is used to encode the whole graph. Can you please explain?

Answer 8 · 2022-04-25T07:39:17.000Z

@AlanSwift this part is quite confusing. Why/how do you encode the whole graph with the BiLSTM?

Also for the decoder pipeline, you mentioned:

attention on RNNencoder results
attention on GNN encoder results.
fuse them

Any other paper used this approach? Can you please provide me any reference paper?

Also would appreciate if you provide me with pointers how is it done in the code?

Answer 9 · 2022-04-25T15:13:37.000Z

The word2vec, BiLSTM, Bert and etc. are used to initialize the node embedding, which can enrich contextural information. This trick is widely used in NLP&GNN research https://arxiv.org/pdf/1908.04942.pdf (only an example).

For technique details, please refer to the implementations.

Answer 10 · 2022-04-25T16:59:46.000Z

@AlanSwift I not asking about word2vec or BERT to initialize the node embedding. I am asking why the BiLSTM is used after learning word2vec or BERT embedding.

As you see the document states:

For instance, for single-token item, w2v_bilstm strategy means we first use word2vec embeddings to initialize each item, and then apply a BiLSTM encoder to encode the whole graph (assuming the node order reserves the sequential order in raw text).

As per the document:

learn word2vec embeddings to initialize each item
then apply a BiLSTM encoder to encode the whole graph

I am asking about the step 2.

Answer 11 · 2022-04-25T17:10:52.000Z

Considering the bidirectional sequential information is beneficial for most NLP tasks.

Answer 12 · 2022-04-25T17:21:35.000Z

@AlanSwift got it. But why the BiLSTM encoder to encode the whole graph? I would be thinking that is used to update the embedding for the node. Isn't it?

Is the description incorrect?

Answer 13 · 2022-04-25T17:36:32.000Z

@AlanSwift the BiLSTM encoder is used to update the embedding for the node embedding. So it appears to me:

initialize node embedding with word2vec or BERT
update using BiLSTM

Now feed it to the GCN encoder.

Is this understanding correct?

Answer 14 · 2022-04-28T19:15:19.000Z

@AlanSwift I understand bidirectional sequential information is beneficial for NLP tasks. But the BiLSTM encoder updates the initial word2vec/bert word embedding before feeding it to the GCN encoder.

So I am confused when you state that BiLSTM encoder to encode the whole graph.

Would you please assist me with this question?

Answer 15 · 2022-05-12T17:12:46.000Z

Yes. It is correct.

Answer 16 · 2022-08-06T10:49:22.000Z

This issue will be closed. Feel free to reopen it if needed.