/Web-Scrapping

Simple web scrapping example on python.

Primary LanguagePython

Web_Scrapping

This is a little program that introduce web scrapping.

Goal

Main purpose is to scrap names of products from internet market.

Pre requests

you need to install libraries:

  • BeautifulSoup sudo pip install bs4
  • Selenium sudo pip install selenium
  • Firefox Webdriver sudo apt-get install firefox-geckodriver

At First

we need to choose internet market. I prefer Trendyol.com. Then we need to copy url of page which interests us. In this code I use this:

https://www.trendyol.com/sr?q=paten&qt=paten&st=paten&os=1&pi=1

By right clicking a page, then clicking inspect you will see a new tab like this: ![](images/Screenshot from 2022-05-17 23-26-56.png)

Then by navigating a cursor you will see that every row is related to image boxes and also highlightes them. By expanding prefered boxes you can reach to interested data. In this image you can see that we have rows ( classes ) of each product. By expending them, you can reach their detailes like name, price, etc. I remind that our goal is to find product names. It looks like this:

<span class="prdct-desc-cntnr-name hasRatings" title="Işıklı Ayarlanabilir Slikon Teker Metal Gövde Paten">Işıklı Ayarlanabilir Slikon Teker Metal Gövde Paten</span>

we found every data we need.

Code

from selenium import webdriver      #Importing Libraries
from bs4 import BeautifulSoup       

driver = webdriver.Firefox()        #Defining Webdriver

for i in range(3):                  #Page scrolling loop
    link=("https://www.trendyol.com/sr?q=paten&qt=paten&st=paten&os=1&pi=" + str(i+1))      #Doing string addition to turn pages
    driver.get(link)                          #Definding URL

    content = driver.page_source
    soup = BeautifulSoup(content, "html.parser")                      #Definfing site parser
    products = soup.find_all('div', class_="p-card-wrppr add-to-bs-card")             #In this row we search a class which include all products classes 
    for products in products:                                                #Then doing loop for every product class
        name = products.find("span", class_="prdct-desc-cntnr-name hasRatings" )  #Searching for name string in product class
        list = str(name).split('"')                                              #Then by converting name to string, we split it and append to list
        if "None" not in list :                                                  #We do not print,if list include "None"
            print(list[3])