This repository presents an end-to-end process of approaching a machine learning project from data collection to deployment. Problem at hand is price prediction of Cleveland real estate although the underlying code is made flexible enough to provide a structured way of tackling a wide variety of tabular problems with minimal changes.
The app is made with Streamlit and is tailored for a random forest model, providing feature importances and contributions. It also contains a useful 3D visualization made with the help of pydeck.
app_demo.mp4
Data was collected through web scrapping of Sold Cleveland real-estate listings from Zillow in various intervals from 01/4/2021
to 25/4/2021
. Data is open-sourced in this repository inside /data/raw
.
{
"basic_info":{
"sale_date":"Sold 04/14/2021",
"latitude":41.432756,
"longitude":-81.809063,
"floorSize":"1,664",
"url":"https://www.zillow.com/homedetails/4486-W-158th-St-Cleveland-OH-44135/33380420_zpid/",
"price":"$140,000"
},
"facts_and_features":{
"Type:":"SingleFamily",
"Year built:":"1952",
"Heating:":"Forced Air, Gas",
"Cooling:":"Central Air",
"Parking:":"Detached, Garage",
"Lot:":"0.15 Acres"
},
"additional_features":{
"Interior details":{
"Bedrooms and bathrooms":{
"Bedrooms":"4",
"Bathrooms":"1",
"Full bathrooms":"1",
"Main level bathrooms":"1"
},
"Basement":{
"Has basement":"Yes",
"Basement":"None"
},
"Heating":{
"Heating features":"Forced Air, Gas"
},
"Cooling":{
"Cooling features":"Central Air"
},
"Appliances":{
"Appliances included":"Dryer, Range, Refrigerator, Washer"
},
"Other interior features":{
"Total structure area":"1,664",
"Total interior livable area":"1,664 sqft",
"Finished area above ground":"1,664",
"Virtual tour":"View virtual tour"
}
},
"Property details":{
"Parking":{
"Parking features":"Detached, Garage",
"Garage spaces":"1"
},
"Property":{
"Exterior features":"Paved Driveway"
},
"Lot":{
"Lot size":"0.15 Acres"
},
"Other property information":{
"Additional parcel(s) included":",,,",
"Parcel number":"02828056"
}
},
"Construction details":{
"Type and style":{
"Home type":"SingleFamily",
"Architectural style":"Bungalow",
"Property subType":"Single Family Residence"
},
"Material information":{
"Construction materials":"Brick, Vinyl Siding",
"Roof":"Asphalt,Fiberglass"
},
"Condition":{
"Year built":"1952"
}
},
"Utilities / Green Energy Details":{
"Utility":{
"Sewer information":"Public Sewer",
"Water information":"Public"
}
},
"Community and Neighborhood Details":{
"Location":{
"Region":"Cleveland"
}
},
"HOA and financial details":{
"HOA":{
"Has HOA fee":"No"
},
"Other financial information":{
"Tax assessed value":"$66,200",
"Annual tax amount":"$1,884"
}
},
"Other":{
"Other facts":{
"Ownership":"Principal/NR"
}
}
}
}
The scraper is provided inside scripts/data/zillow_scraper.py
. It scrappes most of the information from the Facts and Features section of a listing and the listing header. It supports pagination.
To use the scraper you need to edit the url, request header and searchQueryState inside the script.
python scripts/data/zillow_scraper.py
Using Miniconda/Anaconda:
cd path_to_repo
conda env create
conda activate real-estate-price-prediction
The main ML code is structured as depicted below. It offers a structured approach to data splitting, data preprocesing & model search.
Before running the below commands customize the src/config.py
file if needed.
Perform a k-fold split:
bash scripts/data/kfold_split.sh
Train various models with default hyperparameters:
bash scripts/train/train_kfold_all.sh
Fine-tune a promising model
bash scripts/train/train_hparam_search.sh
Streamlit app deployment
bash scripts/deployment/streamlit.py
Please use this bibtex if you want to cite this repository:
@misc{Koch2021realestatepricepred,
author = {Koch, Brando},
title = {real-estate-price-prediction},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/bkoch4142/real-estate-price-prediction}},
}
This repository is under an MIT License