Add forced intercept = 0 option to OrdinaryLeastSquares

Question

Add forced intercept = 0 option to OrdinaryLeastSquares

r-molins-mrp opened this issue a year ago · 6 comments

Hello Brightwind, and thanks for your great package.

A suggestion from our side; I don't think your correlation package allows the possibility to fit a simple y = ax relation (with a forced intercept of 0 then). It can be useful, for LiDAR vs. mast validation for instance, or overall checking the trends between two anemometers.

It should be a simple change hopefully, the _leastsquare function in class OrdinaryLeastSquares

    def _leastsquare(ref_spd, target_spd):
        p, res = lstsq(np.nan_to_num(ref_spd.values.flatten()[:, np.newaxis] ** [1, 0]),
                       np.nan_to_num(target_spd.values.flatten()))[0:2]
        return p[0], p[1]

Should be edited as

    def _leastsquare(ref_spd, target_spd, forced_intercept=None):
			if forced_intercept==True:
		        p, res = lstsq(np.nan_to_num(ref_spd.values.flatten()[:, np.newaxis] ),
		                       np.nan_to_num(target_spd.values.flatten()))[0:2]
		        return p[0]
			else:
		        p, res = lstsq(np.nan_to_num(ref_spd.values.flatten()[:, np.newaxis] ** [1, 0]),
		                       np.nan_to_num(target_spd.values.flatten()))[0:2]
		        return p[0], p[1]

Proof of concept, taken from the documentation and SOW:

x = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 
              20.0, 40.0, 60.0, 80.0])

y = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
              3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
              11.073788414382639, 23.248479770546009, 32.120462301367183, 
              44.036117671229206, 54.009003143831116, 102.7077685684846, 
              185.72880217806673, 256.12183145545811, 301.97120103079675])

x = x[:,np.newaxis]
a, _, _, _ = np.linalg.lstsq(x, y)

plt.plot(x, y, 'bo')
plt.plot(x, a*x, 'r-')
print(a)
plt.show()

returns:

x = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 
              20.0, 40.0, 60.0, 80.0])

y = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
              3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
              11.073788414382639, 23.248479770546009, 32.120462301367183, 
              44.036117671229206, 54.009003143831116, 102.7077685684846, 
              185.72880217806673, 256.12183145545811, 301.97120103079675])

# Our model is y = a * x, so things are quite simple, in this case...
# x needs to be a column vector instead of a 1D vector for this, however.
xx = x[:, np.newaxis]**[0, 1]
a, _, _, _ = np.linalg.lstsq(xx, y)

plt.plot(x, y, 'bo')
plt.plot(x, a[0]+a[1]*x, 'r-')
print(a)
plt.show()

I've doubled check with Excel, it returns the same:
force intercept in Excel.xlsx

Would you mind implementing that? I would have done it myself but I don't currently have a Brightwind Python environment set-up with Github and VS.

Thanks!

Answer 1 · 2023-10-27T15:13:49.000Z

Hi @r-molins-mrp,

Thanks very much for your contribution. Sorry for the slow response.

Could you please change the PR to merge into the dev branch? Generally we gather a few enhancements, bug fixes into the dev branch and then merge that into the master branch in one go to create a new release. We don't usually merge directly into the master branch.

I update the Changelog just to capture this change.

I am having trouble getting your branch onto my local machine. Bare with me on that. In the meantime could you add an example to the docstring to demonstrate how this would be used? I know it is very straight forward but always good to have an example.

Cheers,

Answer 2 · 2023-11-01T19:05:44.000Z

Hi @r-molins-mrp ,

I've got it working on my local machine. It looks good. Some questions below. I'm not following the math so some of my questions may be inappropriate.

The internal function _leastsquare() you now have returning 2 different things. In one hand it returns the slope and offset and in another scenario it returns the slope and r2. This, I think, is a bit confusing for a user to follow. Is there a reason why the r2 has to be calculated within this function and not use the other internal function _get_r2()?
If the r2 can be calculated outside of this, similar to other scenarios, then the new if statement (if self.forced_intercept==True:) under run() may not be needed?
The OrdinaryLeastSquares function can also be performed for each direction sector if a ref_dir is sent. Fitting through the origin isn't implemented for each direction sector. This would also be confusing for a user as they can set the 'forced_intercept' to be True but then it doesn't get used. Ideally we should implement forcing through the origin for each direction sector too. Thoughts?
Is the name 'forced_intercept' appropriate? It suggests to me that I can force the intercept through any y-axis value I set, but it is actually forcing the intercept through 0 or the origin.

I also committed a few changes to the formatting to match PEP8 formatting rules. I use PyCharm and it highlights when these are broken. Quite useful.

Thanks again for your effort in this!

Answer 3 · 2023-11-06T12:21:37.000Z

Hi @stephenholleran , thanks for your comments, these are all good points. Answers below:

Yes you're right; I've aligned and returning p[0], 0 now. The reason why I've added the r2 in the function is that I need res , returned by lstsq, but actually I've tested and can move it to _get_r2, so I've done that
Agree, removed
I've implemented it for sectorwise correlations now also.
True; I've changed to forced_intercept_origin now.

Thanks for the tip on formatting rules; I use VS Code with linting module, but it doesn't seem to be capturing this PEP8 stuff.

See #420

Answer 4 · 2023-11-10T15:29:44.000Z

Hi @r-molins-mrp,

Thanks for all that. I made some PEP8 tidying up, added an example and test for running by dir sector and a bit of refactoring. Let me know that you are happy with all of those and I think we are good to merge?

I have discovered a bug using one of the newer versions of pandas so I'll be making this fix and push a new release ASAP. This will go with that.

Cheers,

Answer 5 · 2023-11-10T15:38:05.000Z

Thanks a lot @stephenholleran for that, all good with me for release - sorry I forgot the sectorwise tests.
Yes the new pandas seem to throw a lot of warnings (if that's what you are talking about).

Answer 6 · 2023-11-10T15:41:07.000Z

Yep, painful. Well, a warning in a previous version that we didn't catch.

Thanks.