Add forced intercept = 0 option to OrdinaryLeastSquares
r-molins-mrp opened this issue · 6 comments
Hello Brightwind, and thanks for your great package.
A suggestion from our side; I don't think your correlation
package allows the possibility to fit a simple y = ax relation (with a forced intercept of 0 then). It can be useful, for LiDAR vs. mast validation for instance, or overall checking the trends between two anemometers.
It should be a simple change hopefully, the _leastsquare
function in class OrdinaryLeastSquares
def _leastsquare(ref_spd, target_spd):
p, res = lstsq(np.nan_to_num(ref_spd.values.flatten()[:, np.newaxis] ** [1, 0]),
np.nan_to_num(target_spd.values.flatten()))[0:2]
return p[0], p[1]
Should be edited as
def _leastsquare(ref_spd, target_spd, forced_intercept=None):
if forced_intercept==True:
p, res = lstsq(np.nan_to_num(ref_spd.values.flatten()[:, np.newaxis] ),
np.nan_to_num(target_spd.values.flatten()))[0:2]
return p[0]
else:
p, res = lstsq(np.nan_to_num(ref_spd.values.flatten()[:, np.newaxis] ** [1, 0]),
np.nan_to_num(target_spd.values.flatten()))[0:2]
return p[0], p[1]
Proof of concept, taken from the documentation and SOW:
x = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0,
20.0, 40.0, 60.0, 80.0])
y = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
11.073788414382639, 23.248479770546009, 32.120462301367183,
44.036117671229206, 54.009003143831116, 102.7077685684846,
185.72880217806673, 256.12183145545811, 301.97120103079675])
x = x[:,np.newaxis]
a, _, _, _ = np.linalg.lstsq(x, y)
plt.plot(x, y, 'bo')
plt.plot(x, a*x, 'r-')
print(a)
plt.show()
x = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0,
20.0, 40.0, 60.0, 80.0])
y = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
11.073788414382639, 23.248479770546009, 32.120462301367183,
44.036117671229206, 54.009003143831116, 102.7077685684846,
185.72880217806673, 256.12183145545811, 301.97120103079675])
# Our model is y = a * x, so things are quite simple, in this case...
# x needs to be a column vector instead of a 1D vector for this, however.
xx = x[:, np.newaxis]**[0, 1]
a, _, _, _ = np.linalg.lstsq(xx, y)
plt.plot(x, y, 'bo')
plt.plot(x, a[0]+a[1]*x, 'r-')
print(a)
plt.show()
I've doubled check with Excel, it returns the same:
force intercept in Excel.xlsx
Would you mind implementing that? I would have done it myself but I don't currently have a Brightwind Python environment set-up with Github and VS.
Thanks!
Hi @r-molins-mrp,
Thanks very much for your contribution. Sorry for the slow response.
Could you please change the PR to merge into the dev
branch? Generally we gather a few enhancements, bug fixes into the dev branch and then merge that into the master branch in one go to create a new release. We don't usually merge directly into the master branch.
I update the Changelog just to capture this change.
I am having trouble getting your branch onto my local machine. Bare with me on that. In the meantime could you add an example to the docstring to demonstrate how this would be used? I know it is very straight forward but always good to have an example.
Cheers,
Hi @r-molins-mrp ,
I've got it working on my local machine. It looks good. Some questions below. I'm not following the math so some of my questions may be inappropriate.
- The internal function
_leastsquare()
you now have returning 2 different things. In one hand it returns the slope and offset and in another scenario it returns the slope and r2. This, I think, is a bit confusing for a user to follow. Is there a reason why the r2 has to be calculated within this function and not use the other internal function_get_r2()
? - If the r2 can be calculated outside of this, similar to other scenarios, then the new if statement (
if self.forced_intercept==True:
) underrun()
may not be needed? - The OrdinaryLeastSquares function can also be performed for each direction sector if a
ref_dir
is sent. Fitting through the origin isn't implemented for each direction sector. This would also be confusing for a user as they can set the 'forced_intercept' to be True but then it doesn't get used. Ideally we should implement forcing through the origin for each direction sector too. Thoughts? - Is the name 'forced_intercept' appropriate? It suggests to me that I can force the intercept through any y-axis value I set, but it is actually forcing the intercept through 0 or the origin.
I also committed a few changes to the formatting to match PEP8 formatting rules. I use PyCharm and it highlights when these are broken. Quite useful.
Thanks again for your effort in this!
Hi @stephenholleran , thanks for your comments, these are all good points. Answers below:
- Yes you're right; I've aligned and returning
p[0], 0
now. The reason why I've added ther2
in the function is that I needres
, returned bylstsq
, but actually I've tested and can move it to_get_r2
, so I've done that - Agree, removed
- I've implemented it for sectorwise correlations now also.
- True; I've changed to
forced_intercept_origin
now.
Thanks for the tip on formatting rules; I use VS Code with linting module, but it doesn't seem to be capturing this PEP8 stuff.
See #420
Hi @r-molins-mrp,
Thanks for all that. I made some PEP8 tidying up, added an example and test for running by dir sector and a bit of refactoring. Let me know that you are happy with all of those and I think we are good to merge?
I have discovered a bug using one of the newer versions of pandas so I'll be making this fix and push a new release ASAP. This will go with that.
Cheers,
Thanks a lot @stephenholleran for that, all good with me for release - sorry I forgot the sectorwise tests.
Yes the new pandas seem to throw a lot of warnings (if that's what you are talking about).
Yep, painful. Well, a warning in a previous version that we didn't catch.
Thanks.