/STGCN

Primary LanguagePython

Short Text GCN

The source code and data (partial) of our paper:

Zhihao Ye, Gongyao Jiang, Ye Liu, Zhiyong Li, Jin Yuan. Document and Word Representations Generated by Graph Convolutional Network and BERT for Short Text Classification. (ECAI 2020, accepted, id 1567)

Required

Python 3.6

Tensorflow >= 1.4.0

Quick Start

Do the experiment of corpus MR:

python train.py

Papare to your corpus:

  1. Pre-process the data as data.clean.txt and doc_names.txt in /data/mr/.

  2. Run the build_graph_mod.py to process data and build graph.

  3. Run the gcn_pretrain.py to pre-train gcn and get pre-trained vectors.

  4. Download the BERT. Fine-tune BERT in your corpus (optional), and start the bert service as:

    bert-serving-start 
    
    -pooling_strategy NONE  
    
    -max_seq_len the corpus' sentence length
    
    -model_dir pre-trained bert dir
    
    -tuned_model_dir your fine-tuned bert dir 
    
  5. Encode the corpus to vectors by BERT.

  6. Prepare the corpus and pre-trained vectors, and run mybilstm_bert_seq.py

Code Reference

Text GCN: https://github.com/yao8839836/text_gcn

bert-as-service: https://github.com/hanxiao/bert-as-service