Phrase Extractor

This is a project on Natural Language processing course where a given training text file with phrases are provided with which model is trained and evaluated. Later the trained model is used to predict the phrases for the test file

The Files present in the project

Required modules

Basic Methodology

The given eval_data.txt is fed to regex_generator.py, it generates a regex pattern file with one word previous and after of the label then the other pattern is with two words before the label. Then the input data is fed to SVM architecture and MLP with 100 layers that trains the model which gives us two classes

  • Found
  • Not Found
Then test data is fed to the trained models to predict the classes weather the phrases are found or not, if the phrases are found then the regex pattern is used to detect the phrase

Dataset

Index Sent Label
1 Can u pls remind me at 7pm on 8 Jan on 8th jan
2 Remind me to buy eggs on next Monday and Tuesday at 9pm buy eggs
3 Can you please remind me to fill a file at 9 pm today fill a file
4 I need a reminder. Every day. At 2.30 pm and 5.30 pm to message my wife. message my wife
5 Remind me at 11 Not Found

Code Snippets

The Pattern Checking

sub = '(\w*)\W*('+label+')\W*(\w*)'
sub = '(\w*)\W*(\w*)\W*('+label+')'

Regex Pattern Genrated

...
...
m = re.search(' to (.+?) at ', text)
if m:
    found = m.group(1)
    small_master_list.append(found)
    
m = re.search(' to (.+?) on ', text)
if m:
    found = m.group(1)
    small_master_list.append(found)
m = re.search(' to (.+?) tomorrow ', text)
...
...

Training the algorithm

clf = svm.LinearSVC(loss='hinge').fit(X_train_tfidf, y_train)
mlp = MLPClassifier(activation='relu', solver='lbfgs').fit(X_train_tfidf, y_train)

Accuracy obtained using the model

Accuracy with SVM is :0.868020304568528
Accuracy score with Multi Layer Perceptron: 0.8477157360406091

Output obtained