Basic Linear Modelling

Question

Basic Linear Modelling

Opened this issue 5 years ago · 7 comments

OK - I have got the plots that I sort of wanted!
Basically for this version - I have created a linear model for the top 5 structural factos for gas, electricity, car energy (average over all huseholds) and catr energy (Only HH with cars)

I have plotted up modelled vs measured - and added a 1:1 line - anything to righ of line could be excess? I have also colored in red where measured is >25% above modelled.

Basically you get a good linear modelling for gas and electric - it doesn't work so well for the MOT data!

Any thoughts welcome.

Answer 1 · 2019-05-30T15:12:19.000Z

These were the r-squared - 0.6-0.7 for gas and electric - MUCH MUCH lower on the cars (this was structural factors not social ones - but will discuss with Sally and Jillian where they got to)

Electricity		R2
	Rooms	0.4316
	% Gas Heating	0.6644
	%Elec Heating	0.6928
	%Flats	0.7029
	Built 1930s	0.7064

Gas
	Rooms	0.5146
	Flats	0.5721
	1930-39	0.5948
	1900-18	0.6064
	% Gas Heating	0.6167

Car (All HH)
	Pop Density	0.06699
	Cars per HH	0.1155
	%HH without cars	0.118
	PT time to Town Centre	0.1179
	% Active to Work	0.1191

Cars (HH with Cars)
	Pop Density	0.02138
	%HH without cars	0.02189
	% Cycle to Work	0.02194
	Distance to Work	0.0226
	Cars per HH (with Cars)	0.02932

Answer 2 · 2019-05-30T15:20:48.000Z

Interesting stuff @timchatterton, many thanks for sharing these results. More discussion to follow no doubt.

Answer 3 · 2019-05-31T07:32:34.000Z

@timchatterton Looking at the code https://github.com/creds2/Excess-Data-Exploration/blob/master/Tim/RScripts/Modelling/Modelling%20main%20factors.R your car models are predicting gas consumption!

Answer 4 · 2019-05-31T07:36:09.000Z

I've committed a small fix, but I can't reproduce your plots as the code is missing

Answer 5 · 2019-05-31T07:37:49.000Z

Also, can you explain how you chose your variables? E.g. why %cycling rather than %driving to work?

Answer 6 · 2019-05-31T12:25:14.000Z

Hi - I clearly hadn't save the right version of the script to github - the gas issue was spotted quite quickly and sorted out - and the plots were added to the bottom of the code - I believe this versionis now updated.

THe variabls were taken from the top 5 most important (structural) according to the XGBoosts

Answer 7 · 2019-06-20T18:23:11.000Z

Hi @timchatterton, I was hoping to talk to you at the meeting, but I was off sick. Fortunatly I'm much better now. I wanted to draw your attention to some experiments in modelling at https://github.com/creds2/Excess-Data-Exploration/blob/master/Modeling_Summary.md I was able to get much better results for the driving, and comparable results for Gas and Electric I used an approach of taking the single most important variable, then finding what correlated with the residuals, and replete.

It gives me a slightly different selection of variables. But you can see the "logic" is similar in both your and my results. The driving result is very strongly correlated (r squared of 0.85) but I'm getting some s-curved results which suggest I'm not correctly handling the non-linearity correctly, any suggestions?