This project involves predicting housing prices in California using a machine learning model trained on the California Housing dataset. The model employed is a RandomForestRegressor, and the project includes steps for data preprocessing, model training, hyperparameter tuning, and evaluation.
In this project, we aim to build a regression model to predict housing prices based on various features. The dataset provides valuable insights into the factors that influence housing prices in different regions of California.
To get started with the project, follow these steps:
- Clone the repository
- Create a virtual environment
- Install the dependencies
The dataset used in this project is the California Housing dataset, which contains various features such as median income, house age, and population that are used to predict the median house value.
The model used in this project is a RandomForestRegressor, a powerful and flexible machine learning algorithm suitable for regression tasks. It builds multiple decision trees and merges them to get a more accurate and stable prediction.
To improve the model's performance, hyperparameter tuning was conducted. This process involves experimenting with different combinations of hyperparameters to find the most optimal settings for the model.
The model's performance is evaluated using metrics such as Mean Squared Error (MSE) and R-squared. These metrics provide insights into the accuracy and effectiveness of the model in predicting housing prices.
To use the model for predicting housing prices:
- Ensure all dependencies are installed.
- Load the trained model.
- Provide the input features for prediction.
The final model achieved a satisfactory performance with a reasonable error margin, making it a reliable tool for predicting housing prices in California.