/Crawler

Here is where the magic happens. This is the scraper repository for AlwaysUpdate Project.

Primary LanguagePython

AlwaysUpdate ~ Web Crawler and Scraper 📰

Always Update

AlwaysUpdate is an e-NewsPaper from Argentina, Colombia, Venezuela and Mexico, that update its news every day.

Getting started 🚀

Things that you need to have installed in your system: 🛠️

  • Python 3.7
  • pip
  • virtualenv
  • AlwaysUpdate ~ DataScience API

Configuration 🔧

Virtual enviroment

virtualenv venv --python=python.3.7
source venv/bin/activate

Dependencies installation

pip install -r requirements.txt

System Variables

export API_URL="$DATASCIENCE_API_HOST/api/v1/"
export GOOGLE_APPLICATION_CREDENTIALS="credentials.json"

Execution

You can execute the crawler with a POST request, in that case you must start the uvicorn server:

cd news_crawler_scraper
uvicorn app.main:app --reload 

If you don't want to work with the server you can use:

python go_spyder_$JOURNAL_NAME.py

Journals:

  • eltiempo
  • lanacion
  • eluniversal
  • xataka

Contributing ✒️

Pull requests are welcome!. And if you have an idea for a feature and dont have time to do this, feel free to open a issue!

Demo

Alt text for your video

License 📄

MIT