Model diagrams for the GNN examples
code-rex1 opened this issue · 16 comments
❓ Questions and Help
This repo presents couple of nice examples for the GNN.
I am particularly interested about:
Do you have the model architecture described somewhere as part of the tutorial or documentation?
Alternately do you have a canonical architecture described somewhere for these Graph2Seq based models?
Is the model same as the Graph2Seq: A Generalized Seq2Seq Model for Graph Inputs?
@AlanSwift @hugochan can you please help?
Currently, we don't provide the architecture graph about the specific applications. But we have visualized specific graph types such as dependency and etc. in our survey paper.
There are some differences.
We apply RNN or bert encoding before GNN to initialize the node embedding. And we use separate attention: 1. attention on node embedding, 2. attention on node initial embedding. This is just an example. For more details, please refer to our docs.
@AlanSwift thanks for your response. But I can't find that much details in the document.
I see at first you generate initial node embedding using word2ve or BERT.
But your statement about the separate attention: 1. attention on node embedding, 2. attention on node initial embedding
is not clear. Can you please elaborate? 🙏
@AlanSwift also a bit confused here you said, we use separate attention:
- attention on node embedding
- attention on node initial embedding.
But the example for the NMT is with a GCN
. GCN
does not use attention. So lost here. Please elaborate so that I can understand the model a little better.
Thanks in advance for your help.
Just an example:
encoder pipeline: RNNencoder --> GNN encoder
decoder pipeline: 1. attention on RNNencoder results 2. attention on GNN encoder results. 3. fuse them
@AlanSwift this RNNencoder comes after the word2vec or bert embedding. The document states:
For instance, for single-token item, w2v_bilstm strategy means we first use word2vec embeddings to initialize each item, and then apply a BiLSTM encoder to encode the whole graph (assuming the node order reserves the sequential order in raw text).
I do not understand why/how the BiLSTM encoder is used to encode the whole graph. Can you please explain?
@AlanSwift this part is quite confusing. Why/how do you encode the whole graph with the BiLSTM?
Also for the decoder pipeline, you mentioned:
- attention on RNNencoder results
- attention on GNN encoder results.
- fuse them
Any other paper used this approach? Can you please provide me any reference paper?
Also would appreciate if you provide me with pointers how is it done in the code?
The word2vec, BiLSTM, Bert and etc. are used to initialize the node embedding, which can enrich contextural information. This trick is widely used in NLP&GNN research https://arxiv.org/pdf/1908.04942.pdf (only an example).
For technique details, please refer to the implementations.
@AlanSwift I not asking about word2vec
or BERT
to initialize the node embedding. I am asking why the BiLSTM
is used after learning word2vec
or BERT
embedding.
As you see the document states:
For instance, for single-token item, w2v_bilstm strategy means we first use word2vec embeddings to initialize each item, and then apply a BiLSTM encoder to encode the whole graph (assuming the node order reserves the sequential order in raw text).
As per the document:
- learn
word2vec
embeddings to initialize each item - then apply a
BiLSTM
encoder to encode the whole graph
I am asking about the step 2
.
Considering the bidirectional sequential information is beneficial for most NLP tasks.
@AlanSwift got it. But why the BiLSTM encoder to encode the whole graph? I would be thinking that is used to update the embedding for the node. Isn't it?
Is the description incorrect?
@AlanSwift the BiLSTM encoder is used to update the embedding for the node embedding. So it appears to me:
- initialize node embedding with word2vec or BERT
- update using BiLSTM
Now feed it to the GCN encoder.
Is this understanding correct?
@AlanSwift I understand bidirectional sequential information is beneficial for NLP tasks. But the BiLSTM encoder updates the initial word2vec/bert
word embedding before feeding it to the GCN encoder.
So I am confused when you state that BiLSTM encoder
to encode the whole graph.
Would you please assist me with this question?
Yes. It is correct.
This issue will be closed. Feel free to reopen it if needed.