/WebscrappingRss

This is a program of webscrapping for 4 URL´s. We will get the html first of each of the URL´s and then the links of their respective rss in order to find the first 10 posts showing in a JSON file the <title><published> and <link> of each post

Primary LanguagePython

Web scraping

Little exercise of web scraping

GitHub top language

GitHub repo size Lines of code GitHub language count

The exercise

The purspose is to show a method about how you can do webscrapping to the rss files and to find them in an html in case you cannot see the rss at first.
The data will be showed in a json format and will be sended to a json file too in order to use the information as we want.
Also we need to take in consideration that maybe we will need to change the entries with the time cause maybe thge attributes will change.

Specifications

  • Python 3.7 or higher
  • It could be executed in Windows, Linux or Mac
  • Run the next coomand in order to install the other python libraries: pip install requirements.txt

Execution of the exercise

After all previous considerations are done in order to run the program you need to write in the terminal the next command:

  • If you´re in Windows:
python ScrapingMain.py
  • If you´re in Linux:
python3 ScrapingMain.py
  • If you´re in Mac:
python ScrapingMain.py