Deep Learning for Text Pairs Relation Classification

This repository is my bachelor graduation project, and it is also a study of TensorFlow, Deep Learning (CNN, RNN, etc.).

The main objective of the project is to determine whether the two sentences are similar in sentence meaning (binary classification problems) by the two given sentences based on Neural Networks (Fasttext, CNN, LSTM, etc.).

Requirements

Python 3.6
Tensorflow 1.1 +
Numpy
Gensim

Innovation

Data part

Make the data support Chinese and English (Which use jieba seems easy).
Can use your own pre-trained word vectors (Which use jieba seems easy).
Add embedding visualization based on the tensorboard.

Model part

Design two subnetworks to solve the task --- Text Pairs Similarity Classification.
Add the correct L2 loss calculation operation.
Add gradients clip operation to prevent gradient explosion.
Add learning rate decay with exponential decay.
Add a new Highway Layer (Which is useful according to the model performance).
Add Batch Normalization Layer.
Add several performance measures (especially the AUC) since the data is imbalanced.

Code part

Can choose to train the model directly or restore the model from the checkpoint in train.py.
Add test.py, the model test code, it can show the predicted values and predicted labels of the data in Testset when creating the final prediction file.
Add other useful data preprocess functions in data_helpers.py.
Use logging for helping to record the whole info (including parameters display, model training info, etc.).
Provide the ability to save the best n checkpoints in checkmate.py, whereas the tf.train.Saver can only save the last n checkpoints.