This project is used by my bachelor graduation project, and it is also a study of TensorFlow, Deep Learning(CNN, RNN, LSTM, etc.).
The main objective of the project is to determine whether the two sentences are similar in sentence meaning (binary classification problems) by the two given sentences based on Neural Networks (Fasttext, CNN, LSTM, etc.).
- Python 3.6
- Tensorflow 1.7 +
- Numpy
- Gensim
Research data may attract copyright protection under China law. Thus, there is only code.
实验数据属于实验室与某公司的合作项目,涉及商业机密,在此不予提供,还望谅解。
- Make the data support Chinese and English.(Which use
jieba
seems easy) - Can use your own pre-trained word vectors.(Which use
gensim
seems easy) - Add embedding visualization based on the tensorboard.
- Deign two subnetworks to solve the task --- Text Pairs Similarity Classification.
- Add a new Highway Layer.(Which is useful based on the performance)
- Add several performance measures(especially the AUC) since the data is imbalanced.
- Can choose to train the model directly or restore the model from checkpoint in
train_cnn.py
. - Add
test_cnn.py
, the model test code. - Add other useful data preprocess functions in
data_helpers.py
. - Use
logging
for helping recording the whole info(including parameters display, model training info, etc.).
Depends on what your data and task are.
You can use jieba
package if you are going to deal with the chinese text data.
- Use
gensim
package to pre-train data. - Use
glove
tools to pre-train data. - Even can use a fasttext network to pre-train data.
Warning: Not finished yet 🤪!
References:
Warning: Not finished yet 🤪!
References:
- Convolutional Neural Networks for Sentence Classification
- A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification
Warning: Not finished yet 🤪!
References:
黄威,Randolph
SCU SE Bachelor; USTC CS Master
Email: chinawolfman@hotmail.com
My Blog: randolph.pro
LinkedIn: randolph's linkedin