MongoDb Python API Dependencies:
- Install Mongo DB version 3.6.4 or later
- Run Mongo DB process and note the port number it is running on
- install pymongo: pip install pymongo
- Update the values of MONGO_DB_PORT to the value noted in step 2 above. By default MongoDb runs on port # 27017 and so it need not be changed
API Usage:
- Insert many car details into the collection:
See
insert_many_records_sample
method inMongoDbClient
class - Insert single car detail into the collection:
See
insert_single_record_sample
method inMongoDbClient
class
Scraping from Carousell
- Navigate to the robots.txt here: https://sg.carousell.com/robots.txt
- Get the sitemap URL from the above location. Here it is: https://sg.carousell.com/sitemap.xml
- Navigate to the sitemap.xml and then search for all product listings with the text "cars"
- Navigate to each of these XML pages and get the list of the car products in the file
- Scrape the details of the cars from each of the car product pages
Configurations
- Set the value of
USE_TEST_URLs
toFalse
to execute steps 3 and 4 in the script - Set
LIMIT
toNone
to scrape all the items from the page, else a number that specifies the number of car products which to scrape