You will need python2
script is main.ipyndb
This repo codes the Linear Regression problem in this video. The math example: How to calculate linear regression using least square method
Linear regression finds the straight line, called the least squares regression line
We want to find the regression line. The line that BEST fits through all our points(the least squares regression line).
To find the BEST fit line we must minimize our actual data from our estimated data.
The data x,y
//
values = [[1,2],[2,4],[3,5],[4,4],[5,5]]
![Alt text](rmimg/img1.jpg?raw=true "Title")
Lets plot the data for visualization(actual plotting is not needed in the code at this point).
what we want to find is the mean of x and the mean of y.
We will write a function for that.
//
def mean(values):
return sum(values) / float(len(values))
Our line will pass through the point that x and y converge.
![Alt text](rmimg/img3.jpg?raw=true "Title")
Lets continue to find out the best fit line. To do so we must subtract the mean of our x from each x value then square each number and add them all up. The same thing must be done with the y value.
//
def variance(values, mean):
return sum([(x-mean)**2 for x in values])
//
def covariance(x, mean_x, y, mean_y):
covar = 0.0
for i in range(len(x)):
covar += (x[i] - mean_x) * (y[i] - mean_y)
return covar
b0 = y_mean - b1 * x_mean
# Putting it togeather
def mean(values):
#print sum(values) / float(len(values))
return sum(values) / float(len(values))
def variance(values, mean):
#print sum([(x-mean)**2 for x in values])
return sum([(x-mean)**2 for x in values])
def covariance(x, mean_x, y, mean_y):
covar = 0.0
for i in range(len(x)):
covar += (x[i] - mean_x) * (y[i] - mean_y)
return covar
values = [[1, 2], [2, 4], [3, 5], [4, 4], [5, 5]]
def coefficients(values):
x = [row[0] for row in values]
y = [row[1] for row in values]
x_mean, y_mean = mean(x), mean(y)
#var_x, var_y = variance(x, mean_x), variance(y, mean_y)
#covar = covariance(x, mean_x, y, mean_y)
b1 = covariance(x, x_mean, y, y_mean) / variance(x, x_mean)
b0 = y_mean - b1 * x_mean
return [b0, b1]
values = [[1, 2], [2, 4], [3, 5], [4, 4], [5, 5]]
b0,b1 = coefficients(values)
print('Coefficients: b0= %.3f, b1=%.3f' % (b0,b1))