Analysis on a large dataset of housing prices and features influencing them for Boston. The dataset used is the publicly available dataset "housing" by UCI.
We attempt to find the performance metric weighing multiple numeric statistics from the dataset.
We have compared performance by methods like linear regression, support vector machines and principal component analysis.
The neural network implementation performed better than those and hence is being reported.
Attribute Information(no missing data):
1. CRIM per capita crime rate by town
2. ZN proportion of residential land zoned for lots over
25,000 sq.ft.
3. INDUS proportion of non-retail business acres per town
4. CHAS Charles River dummy variable (= 1 if tract bounds
river; 0 otherwise)
5. NOX nitric oxides concentration (parts per 10 million)
6. RM average number of rooms per dwelling
7. AGE proportion of owner-occupied units built prior to 1940
8. DIS weighted distances to five Boston employment centres
9. RAD index of accessibility to radial highways
10. TAX full-value property-tax rate per $10,000
11. PTRATIO pupil-teacher ratio by town
12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks
by town
13. LSTAT % lower status of the population
14. MEDV Median value of owner-occupied homes in $1000's
Data is normalized, and feature reduction done through PCA.
Results(MSE):
1. PCA, followed by Linear Regression:155.13
2. PCA, followed by SVM: 50.9986
3. Lasso: 156.35
4. Neural Network: 21.76