This project aims to build a machine learning pipeline to predict car prices using data scraped from online car marketplaces. By combining web scraping, data preprocessing, and machine learning techniques, the project delivers insights and predictions about car prices based on various features.
The project involves:
- Scraping car price data from online sources using Python libraries.
- Cleaning and preprocessing the collected data for analysis.
- Training and evaluating machine learning models to predict car prices.
- Providing insights into features influencing car pricing.
- Web Scraping: Automated data collection from online car marketplaces.
- Data Cleaning and Preprocessing: Handling missing values, encoding categorical features, and feature scaling.
- Exploratory Data Analysis (EDA): Insights into key factors affecting car prices.
- Machine Learning Models: Implementation of regression models like linear regression, decision trees, and gradient boosting for price prediction.
- Evaluation Metrics: Model performance evaluation using metrics such as RMSE and ( R^2 ).
- Programming Language: Python
- Web Scraping:
BeautifulSoup
,requests
,Selenium
- Data Analysis and Visualization:
pandas
,numpy
,matplotlib
,seaborn
- Machine Learning:
scikit-learn
,xgboost
- Other Tools: Jupyter Notebook, GitHub
- Model Performance:
- Best-performing model: Gradient Boosting
- Evaluation metrics:
- ( R^2 ): 0.85
- RMSE: $2000
- Feature Importance:
- Key factors influencing car prices:
- Mileage
- Year of manufacture
- Brand and model
- Key factors influencing car prices:
- Expand scraping to include multiple sources for a more diverse dataset.
- Integrate deep learning models for enhanced prediction accuracy.
- Deploy the model via a web application (e.g., Flask or Streamlit).
- Automate data updates using scheduled web scraping scripts.
This project is licensed under the MIT License.
For questions or collaboration opportunities, please contact:
Viviane Le
- Email: anhlv.fpt@gmail.com
- GitHub: github.com/VivianeLe