After cloning/downloading the project, create a secrets.py file inside the parent directory (TREC-IS) and store the twitter-API access keys and babelnet-key in it. Check the section below to know how to get the access keys.
Given below is a sample of secrets.py file:
consumer_key='xxxx'
consumer_secret='xxxx'
access_token='xxxx'
access_token_secret='xxxx'
babelnet_key='xxxx'
Create a virtual environment for the project and install all the python packages using requirements.txt.
cd TREC-IS/
virtualenv -p python3 envname
source envname/bin/activate
pip install -r requirements.txt
- spacy :
python -m spacy download en
- nltk
Enter python shell and then download all the nltk packages.
>> import nltk
>> nltk.download( )
python -m textblob.download_corpora
download the glove pre-trained model into data/embeddings folder.
Check out the ' Creating a Twitter app ' section in twitter's documentation for developers to get the consumer keys and access tokens.
For extracting Bag-of-Concepts features, you would require an access key from BabelNet. First create an account on it and after logging in, fill the form as mentioned here to increase the daily limit. Add the unique API key as 'babelnet_key' in secrets.py and then you're ready to go.!
After generating the training and test data from the given json files in the data directory, run Preprocessing/Feature_Extractor.py
to generate all features and to run evaluation on the classical machine learning models.
By default, features will be generated for the training data. Change the function parameters/variables (self.norm_df -> self.norm_test_df
) accordingly to generate features for the test data and change the path for saving the generated features from saved_objects/features/train/
to saved_objects/features/test/
in both Preprocessing/Feature_Extractor.py
and Preprocessing/FeaturePyramids.py
.