Great Energy Predictor

How much energy will a building consume?

Founded in 1894, ASHRAE serves to advance the arts and sciences of heating, ventilation, air conditioning refrigeration and their allied fields. ASHRAE members represent building system design and industrial process professionals around the world. With over 54,000 members serving in 132 countries, ASHRAE supports research, standards writing, publishing and continuing education - shaping tomorrow’s built environment today.
We aim to develop accurate models of metered building energy usage in the following areas: chilled water, electric, hot water, and steam meters. The data comes from over 1,000 buildings over a three-year timeframe. With better estimates of these energy-saving investments, large scale investors and financial institutions will be more inclined to invest in this area to enable progress in building efficiencies.

The dataset can be downloaded from

Understanding The Data:

train.csv

  • building_id - Foreign key for the building metadata.
  • meter - The meter id code. Read as {0: electricity, 1: chilledwater, 2: steam, 3: hotwater}. Not every building has all meter types.
  • timestamp - When the measurement was taken
  • meter_reading - The target variable. Energy consumption in kWh (or equivalent). Note that this is real data with measurement error, which we expect will impose a baseline level of modeling error. UPDATE: as discussed here, the site 0 electric meter readings are in kBTU.

No building pts per building id

On average each building has 13951.75983436853 datapoints Building 403 has least no. of datapoints 479

Meters count
We can see that maximum datapoints are for Meter 0. Meter 0 has more data points than 1,2,3 combined.

building_meta.csv

  • site_id - Foreign key for the weather files.
  • building_id - Foreign key for training.csv
  • primary_use - Indicator of the primary category of activities for the building based on EnergyStar property type definitions
  • square_feet - Gross floor area of the building
  • year_built - Year building was opened
  • floor_count - Number of floors of the building

primary use
We can see that most data points are for building related to Education, followed by Offices and Public Entertainment.

weather_train.csv

Weather data from a meteorological station as close as possible to the site.

  • site_id
  • air_temperature - Degrees Celsius
  • cloud_coverage - Portion of the sky covered in clouds, in oktas
  • dew_temperature - Degrees Celsius
  • precip_depth_1_hr - Millimeters
  • sea_level_pressure - Millibar/hectopascals
  • wind_direction - Compass direction (0-360)
  • wind_speed - Meters per second

wind speed
We can see that the wind_speed data is quite discrete. Later in Preprocessing we use this to our advantage and convert this data to Beaufort Scale.

For more visualizations, correlation views etc visit the Project Notebook.

Data PreProcessing

Image for Data PreProcessing Need of Data Preprocessing

  • For achieving better results from the applied model in Machine Learning projects the format of the data has to be in a proper manner. Some specified Machine Learning model needs information in a specified format, for example, Random Forest algorithm does not support null values, therefore to execute random forest algorithm null values have to be managed from the original raw data set.
  • Another aspect is that data set should be formatted in such a way that more than one Machine Learning and Deep Learning algorithms are executed in one data set, and best out of them is chosen.

For our data we've applied **Feature engineering** across timestamp data and wind speed data and **Dropped insignificant columns**. ALong with this we have impleented **Memory Reduction** to reduce our dataset size by **65%**. All can be seen and understood in our [Notebook File](https://github.com/HOD101s/Great-Energy-Predictor/blob/master/Great%20Energy%20Predictor.ipynb).

Building a Neural Network


Here we will be using the Keras framework to build a Neural Network.
Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible.

Following is our model architecture:-

Model
We Built Separate Models and their Training loss trends for separate Meters which can be seen in our Notebook .

Results

__ Linear Regression Multivariate Polynomial Regression Neural Network Non-Mean Imputed Neural Network Mean Imputed
Meter 0(Electric) r2 = 0.3334
mse = 98387.13
mae = 134.24
r2 = 0.3341
mse = 98300.27
mae = 133.74
r2 = 0.7225
mse = 40603.1
mae = 45.9878
r2 = 0.7566
mse = 35621.7
mae = 38.6521
Meter 1 (Chilled Water) NA NA r2 = 0.012
mse = 6.369e7
mae = 379.935
r2 = 0.0085
mse = 6.393e7
mae = 432.62
Meter 2 (Stream) NA NA r2 = 0.0031
mse = 1.86e11
mae = 13626
r2 = 0.0028
mse = 1.86e11
mae = 13680.9
Meter 3 (Hot Water) NA NA r2 = 0.0273
mse = 6.258e6
mae = 294.626
r2 = 0.0389
mse = 6.183e6
mae = 280.508

Inferences

  • We see that models trained on imputed data perform better.
  • Our model for meter 0 works well and gives good predictions.
  • Remaining models do not perform that well. Probably using a different network architecture would result in better performance.
  • It is also possible that data for meter 1,2 and 3 is insufficient. So a deeper network may fit the data better.
  • The electric meter has a better model because of adequate amount of data.
  • The neural network with imputed values i.e. NaN values filled with the mean performs better than the non-imputed neural network.
  • Meter reading is better correlated with square feet, than other parameter.
  • The graphs of each parameter with meter reading seems to fall in an area, and isn’t linearly related.

References

https://keras.io/models/sequential/ https://en.wikipedia.org/wiki/Beaufort_scale https://medium.com/@satnalikamayank12/on-learning-embeddings-for-categorical-data-using-keras-165ff2773fc9 https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/ https://scikit-learn.org/stable/supervised_learning.html#supervised-learning https://towardsdatascience.com/deep-neural-networks-for-regression-problems-81321897ca33