/py_web_scraping

Primary LanguageJupyter Notebook

Learning web scraping with Python

"Scraping a web page means requesting specific data from a target webpage. When you scrape a page, the code you write sends your request to the server hosting the destination page. The code then downloads the page, only extracting the elements of the page defined initially in the crawling job." [font]

[tutorial]

BeatifulSoup:

Beautiful Soup provides simple methods for navigating, searching, and modifying a parse tree in HTML, XML files. It transforms a complex HTML document into a tree of Python objects. It also automatically converts the document to Unicode, so you don’t have to think about encodings. This tool not only helps you scrape but also to clean the data.

Pandas:

Python library used for data manipulation and analysis.

Re:

Regex: The regular expressions library provides a class that represents regular expressions, which are a kind of mini-language used to perform pattern matching within strings.

Virtual enviroment:

  • To create:
python -m virtualenv <name_env>
  • To activate:
source <name_env>/bin/activate

Requirements

  • To create requirements.txt
pip freeze > requirements.txt
  • To install:
pip install -r requirements.txt