Recommendations simulator for qualitative testing given mock personas fitting model requirements.
Python 3.8
PySpark 3.3.1
PyTorch 1.13.0
tweepy 4.12.1
huggingface-hub 0.11.0
Start up a fresh virtual environment in the same version as models you want to test, for example:
conda create -n twitter_hashtag38 python=3.8
conda activate twitter_hashtag38
Then run:
pip install -r requirements.txt
To set up Spark correctly, you may need to set environment variables:
PYTHONPATH="PATH_TO_SPARK_PYTHON"
SPARK_HOME="PATH_TO_SPARK"
PYSPARK_PYTHON="PATH_TO_ENV_PYTHON"
PYSPARK_DRIVER_PYTHON=""PATH_TO_ENV_PYTHON"
Data: run data_utils.py
- Data Collection:
- You MUST have your own TWitter API BEARER_TOKEN and save it to
src/main/data/tweepy_token/BEARER_TOKEN.json
- You MUST have your own TWitter API BEARER_TOKEN and save it to
- Data Preparation:
- Simply run
data_utils.py
to get cleaned data with 200 hashtags, cleaned data with 50 hashtags, and cleaned data with 50 hashtags for non-DL models
- Simply run
Models must be added in src/main
folder, for now we have lstm.py
, resnet.py
, bert.py
, fasttext.py
, tfidf_logistic.py
.
Simply run main.py