Web Scraping and Car Price Prediction

This project aims to build a machine learning pipeline to predict car prices using data scraped from online car marketplaces. By combining web scraping, data preprocessing, and machine learning techniques, the project delivers insights and predictions about car prices based on various features.


Table of Contents

  1. Overview
  2. Features
  3. Technologies Used
  4. Results and Insights
  5. Future Work
  6. License
  7. Contact

Overview

The project involves:

  • Scraping car price data from online sources using Python libraries.
  • Cleaning and preprocessing the collected data for analysis.
  • Training and evaluating machine learning models to predict car prices.
  • Providing insights into features influencing car pricing.

Features

  • Web Scraping: Automated data collection from online car marketplaces.
  • Data Cleaning and Preprocessing: Handling missing values, encoding categorical features, and feature scaling.
  • Exploratory Data Analysis (EDA): Insights into key factors affecting car prices.
  • Machine Learning Models: Implementation of regression models like linear regression, decision trees, and gradient boosting for price prediction.
  • Evaluation Metrics: Model performance evaluation using metrics such as RMSE and ( R^2 ).

Technologies Used

  • Programming Language: Python
  • Web Scraping: BeautifulSoup, requests, Selenium
  • Data Analysis and Visualization: pandas, numpy, matplotlib, seaborn
  • Machine Learning: scikit-learn, xgboost
  • Other Tools: Jupyter Notebook, GitHub

Results and Insights

  • Model Performance:
    • Best-performing model: Gradient Boosting
    • Evaluation metrics:
      • ( R^2 ): 0.85
      • RMSE: $2000
  • Feature Importance:
    • Key factors influencing car prices:
      • Mileage
      • Year of manufacture
      • Brand and model

Future Work

  • Expand scraping to include multiple sources for a more diverse dataset.
  • Integrate deep learning models for enhanced prediction accuracy.
  • Deploy the model via a web application (e.g., Flask or Streamlit).
  • Automate data updates using scheduled web scraping scripts.

License

This project is licensed under the MIT License.


Contact

For questions or collaboration opportunities, please contact:

Viviane Le