California House Price Prediction Demo
This project aims to understand how house prices are determined and how they correlate with various factors in California. By analyzing the data, we can gain insights into the housing market trends and factors influencing house prices.
The dataset includes the following features:
- Longitude: Indicates how far west a house is.
- Latitude: Indicates how far north a house is.
- Housing Median Age: Median age of houses within a block; lower values represent newer buildings.
- Total Rooms: Total number of rooms within a block.
- Total Bedrooms: Total number of bedrooms within a block.
- Population: Total number of people residing within a block.
- Households: Total number of households within a block.
- Median Income: Median income for households within a block (measured in tens of thousands of US Dollars).
- Median House Value: Median house value for households within a block (measured in US Dollars).
- Ocean Proximity: Indicates the location of the house with respect to the ocean/sea.
I have cleaned the data by:
- Dropping null rows
- Providing one-hot vectors for categorical variables
- Found correlation between variables using heatmap, etc.
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Scikit-learn
- Linear Regression
- Random Forest Regressor
You can try out this model using the California housing prices dataset available on Kaggle: California Housing Prices Dataset
- Linear Regression: 0.6687407117584969 (66.87%)
- Linear Regression (After Scaling): 0.6692303774756764 (66.92%)
- RandomForestRegressor: 0.7649087057809763 (76.49%)
- GridSearchCV RandomForestRegressor: 0.771010764128469 (77.1%)
The project has been completed successfully, providing valuable insights into data analysis concepts and data science methodologies.