- This is for multi-class short text classification.
- Model is built with Word Embedding, LSTM ( or GRU), and Fully-connected layer by Pytorch.
- A mini-batch is created by 0 padding and processed by using torch.nn.utils.rnn.PackedSequence.
- Cross-entropy Loss + Adam optimizer.
- Support pretrained word embedding (GloVe).
paper:
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-scale Data Collections
code:
https://github.com/keishinkickback/Pytorch-RNN-text-classification
- Embedding --> Dropout --> LSTM(GRU) --> Dropout --> FC.
- download GloVe embeddings.
python preprocess.py
train data at ./data/aminer_train.tsv
label sentence
<lable> <sentence>
- The following command starts training. Run it with
-h
for optional arguments.
python main.py
-
run the server
python server.py
-
GET Method:
http://166.111.5.228:5012/query/<query>
-
Return in json, Example:
http://166.111.5.228:5012/query/John {"tag": "0"}
0-学者,1-文章,2-会议,3-chitchat
|- main.py
|- classify.py
|- [dir] glove (word library)
|- [dir] data (dataset)
|- [dir] gen (well-trained models)