Web Scraping and Car Price Prediction

This project aims to build a machine learning pipeline to predict car prices using data scraped from online car marketplaces. By combining web scraping, data preprocessing, and machine learning techniques, the project delivers insights and predictions about car prices based on various features.

Overview
Features
Technologies Used
Results and Insights
Future Work
License
Contact

Overview

The project involves:

Scraping car price data from online sources using Python libraries.
Cleaning and preprocessing the collected data for analysis.
Training and evaluating machine learning models to predict car prices.
Providing insights into features influencing car pricing.

Features

Web Scraping: Automated data collection from online car marketplaces.
Data Cleaning and Preprocessing: Handling missing values, encoding categorical features, and feature scaling.
Exploratory Data Analysis (EDA): Insights into key factors affecting car prices.
Machine Learning Models: Implementation of regression models like linear regression, decision trees, and gradient boosting for price prediction.
Evaluation Metrics: Model performance evaluation using metrics such as RMSE and ( R^2 ).

Technologies Used

Programming Language: Python
Web Scraping: BeautifulSoup, requests, Selenium
Data Analysis and Visualization: pandas, numpy, matplotlib, seaborn
Machine Learning: scikit-learn, xgboost
Other Tools: Jupyter Notebook, GitHub

Results and Insights

Model Performance:
- Best-performing model: Gradient Boosting
- Evaluation metrics:
  - ( R^2 ): 0.85
  - RMSE: $2000
Feature Importance:
- Key factors influencing car prices:
  - Mileage
  - Year of manufacture
  - Brand and model

Future Work

Expand scraping to include multiple sources for a more diverse dataset.
Integrate deep learning models for enhanced prediction accuracy.
Deploy the model via a web application (e.g., Flask or Streamlit).
Automate data updates using scheduled web scraping scripts.

License

This project is licensed under the MIT License.

Contact

For questions or collaboration opportunities, please contact:

Viviane Le

Email: anhlv.fpt@gmail.com
GitHub: github.com/VivianeLe

VivianeLe/Webscraping-car-price-prediction