Deep Learning for Text Pairs Classification

This project is used by my bachelor graduation project, and it is also a study of TensorFlow, Deep Learning(CNN, RNN, LSTM, etc.).

The main objective of the project is to determine whether the two sentences are similar in sentence meaning (binary classification problems) by the two given sentences based on Neural Networks (Fasttext, CNN, LSTM, etc.).

Requirements

Python 3.6
Tensorflow 1.7 +
Numpy
Gensim

Data

Research data may attract copyright protection under China law. Thus, there is only code.

实验数据属于实验室与某公司的合作项目，涉及商业机密，在此不予提供，还望谅解。

Innovation

Data part

Make the data support Chinese and English.(Which use jieba seems easy)
Can use your own pre-trained word vectors.(Which use gensim seems easy)
Add embedding visualization based on the tensorboard.

Model part

Deign two subnetworks to solve the task --- Text Pairs Similarity Classification.
Add a new Highway Layer.(Which is useful based on the performance)
Add several performance measures(especially the AUC) since the data is imbalanced.

Code part

Can choose to train the model directly or restore the model from checkpoint in train_cnn.py.
Add test_cnn.py, the model test code.
Add other useful data preprocess functions in data_helpers.py.
Use logging for helping recording the whole info(including parameters display, model training info, etc.).

Data Preprocessing

Depends on what your data and task are.

Text Segment

You can use jieba package if you are going to deal with the chinese text data.

Pre-trained Word Vectors

Use gensim package to pre-train data.
Use glove tools to pre-train data.
Even can use a fasttext network to pre-train data.

Network Structure

FastText

Warning: Not finished yet 🤪!

References:

Bag of Tricks for Efficient Text Classification

TextANN

Warning: Not finished yet 🤪!

TextCNN

References:

TextRNN

Warning: Not finished yet 🤪!

References:

Recurrent Neural Network for Text Classification with Multi-Task Learning

About Me

黄威，Randolph

SCU SE Bachelor; USTC CS Master

Email: chinawolfman@hotmail.com

My Blog: randolph.pro

LinkedIn: randolph's linkedin

Maxpridy/Text-Pairs-Relation-Classification

Deep Learning for Text Pairs Classification

Requirements

Data

Innovation

Data part

Model part

Code part

Data Preprocessing

Text Segment

Pre-trained Word Vectors

Network Structure

FastText

TextANN

TextCNN

TextRNN

About Me