- Create virtual environment, and activate it
- pip install -r requirement.txt
Inorder to scrap; sequentially follow the step,
- collect all the
products listing
python products.py
- collect all
product details
python productDetails.py
- collect all
seller details
wrt to product detail
python seller.py
OUTPUT: after running sequentially above steps desired files to expect are : unique_pro.csv, unique_pro_details.csv, unique_seller.csv
All the functions within the steps are mentioned below, in order to run just one function comment other function :P
1- def get_product_listing()
PAGE
: 10
URL
: https://www.flipkart.com/books/literature-books/pr?sid=bks,w4n&wid=3.productCard.PMU_V2_3&page=
donot miss to add &page=
at end of the URL
[NOTE: I have not included logic to scrap page numbers]
output will be files eg: pro1.csv, pro2.csv, pro3.csv, pro4.csv ..... pro10.csv
2- def get_unique_pid_mapping()
Enter as described in input, in our case enter 10
unique_pro.csv
File contains unique products found from all the product listed on all pages pro1.csv, pro2.csv......
1- def extract_product_details()
internal: unique_pro.csv
internally based on file generated from above steps
proDetail1.csv, proDetail2.csv, proDetail3.csv ......... proDetailX.csv
each productDetailx.csv
page contains 50 unique product details
[X = total unique products / 50]
2 - def get_unique_pid_mapping()
internal: unique_pro.csv
internally based on file generated from above steps
unique_pro_details.csv
1 - def extract_seller()
internal: unique_pro_details.csv
internally based on file generated from above steps
seller1.csv, seller2.csv, ..... sellerX.csv
2 - get_unique()
internally based on file generated from above steps
unique_seller.csv
Changing the harcoded class name can bring the desired results. check for @HARDCODE tag to find harcoded attributes. Extracting has majorly been done using .text
in beautiful soup object.