/Introduction_NLP

Lab exercises + project: Stance_Classification

Primary LanguageJupyter Notebook

Introduction_NLP

Stance Classification in Tweets

Pre-Processing

Step 1: Execute python3 preprocessor.py --in_path --out_path --remove_numbers --remove_special_characters --remove_stopwords --stem

in_path is the data to be preprocessed. Default file is data/semeval2016-task6-trainingdata.txt
out_path should be the location of your output data. Default location is output/

Step 2: Pass in hyperparameters for further tunning:

remove_numbers will remove all digits from 0 to 9
remove_special_characters will remove sepecial characters from the dataset
remove_stopwords will remove English sopwords
stem will apply stemming on the dataset

Feature Engineering

Calculation of Term Frequency - Inverse Document Frequency TF-IDF was done using the following procedure:

fit_transformer()
get_feature_names() is made index of the DataFrame
todense() is applied to make the Dataframe dense
transpose() replaces row with columns and columns with rows to have the Bag of Words (BOW) on as columns instead of rows

RESTful-API

To read the documentation and the format of the POST requests, run restapi.py and from URL go to /docs and/or /redoc