This code is divided into two parts, each responsible for a different aspect of scraping Amazon product data using Selenium and Beautiful Soup.
- In this part, the code sets up Selenium to automate a headless Chrome browser and navigate to Amazon's website.
- It searches for products in a specific category (e.g., "tshirt for mens") and retrieves search results.
- The code scrapes product details such as title, category, sub-category, price, ratings, total ratings, and product URL from the search results.
- It checks if the data already exists in a CSV file and appends new data if it doesn't exist.
- The scraper then continues to the next page of search results and repeats the scraping process until there are no more pages of results.
- Finally, it closes the browser.
- This part of the code reads the CSV file generated by the scraper in Part 1 using the Pandas library.
- It sets up Selenium once again to visit the product pages for each item listed in the CSV file.
- For each product, it extracts additional information such as product description and date first available.
- The code removes quotation marks from the title and appends all this data to a new CSV file, creating a more comprehensive dataset.
- After processing all products, it prints "All data saved done" and quits the Selenium WebDriver.
Overall, these two parts work together to scrape product data from Amazon, refine it, and save it to a new CSV file for further analysis or use.