Repository for my MSc project at the University of Edinburgh titled:
Data Mining Project - Is there a link between Electricity Consumption and weather in the Informatics Forum
Dataset should be download separately due to licensing and permission issue.
For the MIDAS dataset, download them from The CEDA Archive at this link. Permission must be obtained from CEDA.
For the Infenergy dataset, access to the School of Informatics Local Area Network is needed. Then, querying of the data can be referred to the GitHub repository davidcsterratt/infenergy. Alternatively, with the right permission, the PostgreSQL database that contains the Infenergy data can be dumped. This is the method used in this project and the codes. For this, referred to my edited version of the GitHub repository ed9w2in6/infenergy to query to data. Instruction and requirements to use the repository are written, fixes are also added to deal with timezone differences, in order to facilitate work in remote environment.
The files at the root directory of this repository are described in this section.
The markdown file README.md
is this file that you are currently reading. It describes notes and instruction to use the codes in this repository.
These directories and files are not related to the project. They can be ignored.
The Jupyter notebook files 2016_hourly_plots.ipynb
and MIDAS_2016_daily_hourly_and_other_locations.ipynb
are used for initial exploration to choose a weather station in Edinburgh to use.
The Jupyter notebook files extraction.ipynb
and Combining.ipynb
are used for the pre-preocessing of the MIDAS and Infenergy dataset. extraction.ipynb
is used for data extraction of the chosen weather station -- Gogar Bank from the larger complete MIDAS UK Hourly Weather data
and UK Hourly Rainfall data
. Combining.ipynb
is used to import Infenergy dataset from a local PostgreSQL server created from a dump of the one at the LAN of School of Informatics; Combining.ipynb
is also used to combine the Infenergy dataset with the extracted Gogar Bank weather station data to create a unified dataset of weather variables and electricity consumption of the Informatics Forum.
Alternatively, you can create the environment via
conda create -n r r-base r-rpostgresql r-devtools r-irkernel rpy2 notebook numpy pandas scikit-learn seaborn windrose conda-tree
, which works fine as of 29th March, 2021.
The YAML file environment.yml
is created using the command conda env export > environment.yml
. It can be used to recreate the Conda environment via Miniconda using the following command.
conda env create -f environment.yml
Note that the environment are originally setup in a computer with the following system information:
MacOS version: 10.15.6(19G73)
zsh 5.7.1 (x86_64-apple-darwin19.0)
conda 4.8.4
You MUST adjust the environment accordingly if there are any errors.
The directory src/
contains the main code used to produce plots and to fit models, also a testing code to ensure environment is correctly set up.
This file is not related to the project. It can be ignored.
The Jupyter notebook file src/r_environment_setup_test.ipynb
should be ran to confirm the environment is correctly setup, there should be no errors and a plot of the hourly electricity consumption of the day 2014-04-30 should be plotted.
The Jupyter notebook file src/main_EDA_and_Modelling.ipynb
contains all the codes ever used for producing the figures in the report, and all the models trained for the projects.
Note that the file paths for data in all codes should be changed accordingly to match the paths after the datasets are downloaded.