/tweets_tagger

Tweets tagging tool

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

TLDR

We need a corpus of tweets tagged. So we build this project to obtain it fast.

Installation

Repository clone

First step is clone this repository.

This project requires mongodb installed and running on port 27017.

Python requirements

You will need to create a virtualenv with python3.7:

virtualenv --python=`which python3.7` venv
source venv/bin/activate

Then install requirements:

pip install -r requirements.txt

This projects install itself with it's own setup.py, you just need execute:

pip install -e .

Frontend requirements

We need install Node.js in order to build frontend code. You can use vnm utility to manage node installation. Currently we are using 12.16:

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh | bash
NODE_VERSION=12.16.2
nvm install $NODE_VERSION
nvm use $NODE_VERSION

Now time to install dependencies and compile frontend:

cd web
npm install
npm run build

You have been generated distributable static code. Last command will give you some warnings, don't be worried of those. Now we need run our application to serve this static files.

Run server

We are using uvicorn to serve our application. Just need execute:

uvicorn tweet_tagger.main:api --debug

Now you can tagger your tweets on http://127.0.0.1:8000

In order to test or analyze the API you can load:

There you can read endpoints documentation, but by the moment no data has been imported... it's time to do it!

Tweet data importation

The app use MongoDB so you need install it. You process documenation on it's own webpage: mongodb.com

One you have done, its time to donwload data and import into database.

Download tweets to a CSV

We used GetOldTweets3 module to download tweets. With this command you will download all tweets at 10 kms from Seville related with Coronavirus and Holy Week in Spanish language:

GetOldTweets3 --querysearch "coronaviru+semana+santa" --near "Sevilla" --within 10km --maxtweets 100 --lang es

The default output csv file name is output_got.csv. Suppose you download tweets on your ~/data/ folder.

Tweett CSV importation

Now you must to use this script to import in mongodb:

python bin/import_tweets.py --csv-path ~/data/output_got.csv

This simple scripts uses mongodb and CSV path settings defined on tweet_tagger.settings module.

This built code has been served by our fastapi server that must to be running