Carlist.my is website for used cars listing for sale in Malaysia.
This project uses scrapy to extract the web car listing data from carlist.my
pip install scrapy
pip lxml
The Chrome Extension
xpath helper
- Analyze the URL rules and format
- Develop a data extraction strategy
- Determine how data is stored
- Remove the column that is not relevant like 'type', 'position', 'item_type', 'item_additionalType', 'item_url', 'item_image', 'item_offers_type', 'item_offers_priceCurrency', 'item_offers_itemCondition', 'item_offers_seller_url', etc.
- Extract the car model year and engine capacity (cc) from the 'item_name' column by using regular expression (RegEx).
The link: https://public.tableau.com/app/profile/weng.seng/viz/carlist2/Story1?publish=yes
5.1 Toyota
The top listing model: Vios
The top listing model year: 2014
The top listing body type: Sedan then followed by MPV
5.2 Peroduo
The top listing model: Myvi
The top listing model year: 2015
The top listing body type: Hatchback