malllabiisc/WordGCN

About using own text data for SynGCN and SemGCN

40347015S opened this issue · 1 comments

Your WordGCN paper is very exciting and very well written, so I want to try to use your code in my current work, and I would like to ask you some questions.
For training SynGCN and SemGCN, If I try to use other text data such as transcripts of speech recognition benchmark corpus (AMI) rather than the Wikipedia corpus and receive the AMI corpus-based SynGCN and SemGCN word embeddings, what is the first step I need to do, or how to process my own text data.
Thanks!

Shih-Hsuan

Hi Shih-Hsuan
The corpus need to be arranged in the data.txt format which has been described in the readme. You'll have to run a dependency parser on your corpus so that you can get a dependency parse tree for each sentence