Last time I talked about Simple Linear Regression which is when we predict a y value given a single feature variable x. However data is rarely that simple and we often can have many variables we can use. For example, what if we want to predict the cost of a house and we have access to the size, number of bedrooms, number of bathrooms, age etc. For this kind of prediction we need to use Multivariate Linear Regression.
Luckily to do this it doesn't require too much to change compared to Simple Linear Regression. The updated hyptohesis function looks like so:
To calculate this efficiently, we can use matrix multiplication which is used in Linear Algebra. Using this method our hypothesis function looks like this:
To do this succesfully, we have to match our two matricies in size so that we can perform matrix multiplication. We acheive this by setting our first feature value . To explain more simply, the amount of columns in our training matrix must match the amount of rows in our theta vector.
The cost function remains similar to the one used in simple linear regression althogh an updated one using matrix operations looks like this:
This is implemented in my code as:
def cost_function(X, y, theta):
m = y.size
error = np.dot(X, theta.T) - y
cost = 1/(2*m) * np.dot(error.T, error)
return cost, error
For gradient descent to work with multiple features, we have to do the same as in simple linear regression and update our theta values simultaneously over the amount of iterations and using the learning rate we supply.
This is implemented in my code as:
def gradient_descent(X, y, theta, alpha, iters):
cost_array = np.zeros(iters)
m = y.size
for i in range(iters):
cost, error = cost_function(X, y, theta)
theta = theta - (alpha * (1/m) * np.dot(X.T, error))
cost_array[i] = cost
return theta, cost_array
This code also populltes a Numpy array of cost values so that we can plot a graph which shows the hopeful reduction in cost values as gradient descent runs.
When working with multiple feature variables it will speed up gradient descent signicantly if they all are within a small range. We can achieve this using feature normalization. While there are a few ways to acheive this, a common used method uses the following method:
(The feature minus the mean of all the feature variables divided by the standard deviation)
To implement this version of linear regression I decided to use a 2 feature hyptohetical dataset that featured the size of a house and how many bedrooms it had. The y value would be the price.
Size | Bedrooms | Price |
---|---|---|
2104 | 3 | 399900 |
1600 | 3 | 329900 |
2400 | 3 | 369000 |
1416 | 2 | 232000 |
With theta values set to 0, the cost value returned the following amount:
With initial theta values of [0. 0. 0.], cost error is 65591548106.45744
After running gradient descent over the data, we see the cost value dropping:
After running for 2000 iterations (this value could be a lot smaller as convergence is acheived much earlier), gradient descent returns the following theta values:
With final theta values of [340397.96353532 109848.00846026 -5866.45408497], cost error is 2043544218.7812893
python housepricelinearregression.py