/web-scraping-projects

This repository provides various web scraping projects in Jupyter notebooks for both learning and data-related workshopes

Primary LanguageJupyter Notebook

Web Scraping Projects With Python

This repository contains a collection of tools, scripts and projects that focus on analysis and visualisation of football data.

Contents

Table of Contents
  1. About the project
  2. Prerequisites
  3. Folder Structure
  4. Projects
    • Scraping salaries data from Salary.com
    • Scraping car's data and crawling to specific URLs
    • Scraping of transfers data
    • Scraping different types of football data from Understat.com
    • Scraping movie data from Cineb.com
    • Scraping Real-estate data and crawling to Appartement pages
    • Scraping amazons data by keywords search

About the Project

This repository has a collection of web scraping projects. I attempted to scrape many websites in order to cope with various structures and obtain various sorts of data (cars, salary, sports...). Some of these projects feature crawling techniques as well as exploratory data visualization. I'd also like to point out that the web isn't constant, thus the method I approach a specific website scraping now may not be appropriate in the future.

I recommend starting with the notebook that scrapes movie data from Cineb.com since it provides an understanding of how the scraping is done.

Prerequisites


made-with-python

Made withJupyter


The following open source packages are used in this project:

  • Pandas

  • Matplotlib

  • bs4

  • requests

  • csv

  • json

Folder structure

|-- web-scraping-projects
    |-- README.md
    |-- data-directory
    |   |-- books_data.csv
    |   |-- cars.csv
    |   |-- movies.csv
    |   |-- real_estate.csv
    |   |-- salary_data.csv
    |   |-- transfers_data.csv
    |-- notebooks
        |-- Amazon.ipynb
        |-- Carvago.ipynb
        |-- Cineb_movies.ipynb
        |-- Real estate.ipynb
        |-- Salaries.ipynb
        |-- Transfermarkt.ipynb
        |-- Understat.ipynb
        |-- .ipynb_checkpoints