Founded in 1894, ASHRAE serves to advance the arts and sciences of heating, ventilation, air conditioning refrigeration and their allied fields. ASHRAE members represent building system design and industrial process professionals around the world. With over 54,000 members serving in 132 countries, ASHRAE supports research, standards writing, publishing and continuing education - shaping tomorrow’s built environment today.
We aim to develop accurate models of metered building energy usage in the following areas: chilled water, electric, hot water, and steam meters. The data comes from over 1,000 buildings over a three-year timeframe. With better estimates of these energy-saving investments, large scale investors and financial institutions will be more inclined to invest in this area to enable progress in building efficiencies.
The dataset can be downloaded from
- building_id - Foreign key for the building metadata.
- meter - The meter id code. Read as {0: electricity, 1: chilledwater, 2: steam, 3: hotwater}. Not every building has all meter types.
- timestamp - When the measurement was taken
- meter_reading - The target variable. Energy consumption in kWh (or equivalent). Note that this is real data with measurement error, which we expect will impose a baseline level of modeling error. UPDATE: as discussed here, the site 0 electric meter readings are in kBTU.
On average each building has 13951.75983436853 datapoints Building 403 has least no. of datapoints 479
We can see that maximum datapoints are for Meter 0. Meter 0 has more data points than 1,2,3 combined.
- site_id - Foreign key for the weather files.
- building_id - Foreign key for training.csv
- primary_use - Indicator of the primary category of activities for the building based on EnergyStar property type definitions
- square_feet - Gross floor area of the building
- year_built - Year building was opened
- floor_count - Number of floors of the building
We can see that most data points are for building related to Education, followed by Offices and Public Entertainment.
Weather data from a meteorological station as close as possible to the site.
- site_id
- air_temperature - Degrees Celsius
- cloud_coverage - Portion of the sky covered in clouds, in oktas
- dew_temperature - Degrees Celsius
- precip_depth_1_hr - Millimeters
- sea_level_pressure - Millibar/hectopascals
- wind_direction - Compass direction (0-360)
- wind_speed - Meters per second
We can see that the wind_speed data is quite discrete. Later in Preprocessing we use this to our advantage and convert this data to Beaufort Scale.
For more visualizations, correlation views etc visit the Project Notebook.
- For achieving better results from the applied model in Machine Learning projects the format of the data has to be in a proper manner. Some specified Machine Learning model needs information in a specified format, for example, Random Forest algorithm does not support null values, therefore to execute random forest algorithm null values have to be managed from the original raw data set.
- Another aspect is that data set should be formatted in such a way that more than one Machine Learning and Deep Learning algorithms are executed in one data set, and best out of them is chosen.
For our data we've applied **Feature engineering** across timestamp data and wind speed data and **Dropped insignificant columns**. ALong with this we have impleented **Memory Reduction** to reduce our dataset size by **65%**. All can be seen and understood in our [Notebook File](https://github.com/HOD101s/Great-Energy-Predictor/blob/master/Great%20Energy%20Predictor.ipynb).
Here we will be using the Keras framework to build a Neural Network.
Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible.
We Built Separate Models and their Training loss trends for separate Meters which can be seen in our Notebook
.
__ | Linear Regression | Multivariate Polynomial Regression | Neural Network Non-Mean Imputed | Neural Network Mean Imputed |
---|---|---|---|---|
Meter 0(Electric) | r2 = 0.3334 mse = 98387.13 mae = 134.24 |
r2 = 0.3341 mse = 98300.27 mae = 133.74 |
r2 = 0.7225 mse = 40603.1 mae = 45.9878 |
r2 = 0.7566 mse = 35621.7 mae = 38.6521 |
Meter 1 (Chilled Water) | NA | NA | r2 = 0.012 mse = 6.369e7 mae = 379.935 |
r2 = 0.0085 mse = 6.393e7 mae = 432.62 |
Meter 2 (Stream) | NA | NA | r2 = 0.0031 mse = 1.86e11 mae = 13626 |
r2 = 0.0028 mse = 1.86e11 mae = 13680.9 |
Meter 3 (Hot Water) | NA | NA | r2 = 0.0273 mse = 6.258e6 mae = 294.626 |
r2 = 0.0389 mse = 6.183e6 mae = 280.508 |
- We see that models trained on imputed data perform better.
- Our model for meter 0 works well and gives good predictions.
- Remaining models do not perform that well. Probably using a different network architecture would result in better performance.
- It is also possible that data for meter 1,2 and 3 is insufficient. So a deeper network may fit the data better.
- The electric meter has a better model because of adequate amount of data.
- The neural network with imputed values i.e. NaN values filled with the mean performs better than the non-imputed neural network.
- Meter reading is better correlated with square feet, than other parameter.
- The graphs of each parameter with meter reading seems to fall in an area, and isn’t linearly related.
https://keras.io/models/sequential/ https://en.wikipedia.org/wiki/Beaufort_scale https://medium.com/@satnalikamayank12/on-learning-embeddings-for-categorical-data-using-keras-165ff2773fc9 https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/ https://scikit-learn.org/stable/supervised_learning.html#supervised-learning https://towardsdatascience.com/deep-neural-networks-for-regression-problems-81321897ca33