Web Scripting is an automatic method to obtain large amounts of data from websites.
Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications.
Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.
- Installing the required third-party libraries
- Request the content (source code) of a specific URL from the server
- Parsing the HTML content that is returned
- Identify the elements of the page that we want
- Extract and (if necessary) reformat those elements into a dataset we can analyze or use in whatever way we require.
pip install requests
pip install bs4
pip install pandas
import requests
URL = "https://www.xyz/lmn/"
response = requests.get(URL)
print(response.content)
from bs4 import BeautifulSoup
htmlcontent=response.content
soup = BeautifulSoup(htmlcontent, 'html.parser')
print(soup.prettify())
products=[]
prices=[]
ratings=[]
for i in soup.findAll('a',href=True,attrs={'class':'_1fQZEK'}):
product=i.find('div',attrs={'class':'_4rR01T'})
price=i.find('div',attrs={'class':'_30jeq3 _1_WHN1'})
rating=i.find('div',attrs={'class':'_3LWZlK'})
products.append(product.text)
prices.append(price.text)
ratings.append(rating.text)
df=pd.DataFrame({'Product Name':products,'Price':prices,'Rating':ratings})
df.to_csv('laptops.csv')