Using linear regression techniques, the project's goal is to forecast property values based on square footage and additional attributes. The model is constructed using a training and testing dataset. To increase the accuracy of the model, several steps of data preprocessing are performed. After then, the forecasts are stored in a csv file.
- Training Data:
train_data.csv
- Testing Data:
test_data.csv
ID
: Identifier for each housecrim
: Per capita crime rate by townzn
: Proportion of residential land zoned for lots over 25,000 sq. ft.indus
: Proportion of non-retail business acres per townchas
: Charles River dummy variable (1 if tract bounds river; 0 otherwise)nox
: Nitric oxides concentration (parts per 10 million)rm
: Average number of rooms per dwellingage
: Proportion of owner-occupied units built prior to 1940dis
: Weighted distances to five Boston employment centresrad
: Index of accessibility to radial highwaystax
: Full-value property tax rate per $10,000ptratio
: Pupil-teacher ratio by townblack
: 1000(Bk - 0.63)^2 where Bk is the proportion of black residents by townlstat
: Percentage of lower status of the populationmedv
: Median value of owner-occupied homes in $1000s (only in training data)
- Python 3.x
- pandas
- numpy
- matplotlib
- scikit-learn
- scipy