Multivariate Time Series Forecasting of Oil Production Based on Ensemble Deep Learning and Genetic Algorithm
This project focuses on developing a forecasting model for oil production using advanced machine learning techniques and optimization algorithms. The project includes the development of a Genetic Algorithm- Temporal Convolutional Neural Network- Long Short-Term Memory (GA-TCN-LSTM) ensemble model, as well as benchmarking against conventional models such as Recurrent Neural Network (RNN), Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), and Temporal Convolutional Network (TCN). For more details on the methodology and techniques used in this project, please read the preprint paper.
Additionally, the project includes exploratory data analysis and data cleaning using our custom-built odc module
. The module is designed to detect, visualize, and treat outliers in oil well datasets and is readily available in the odc repository.
Oil production forecasting is a critical task for many oil and gas companies, governments, and policy-makers. Accurate forecasts are essential for planning and decision-making, such as determining production rates, managing inventory, and estimating future revenue.
Conventional oil production forecasting methods have limitations due to complex data, high uncertainty, and failure to reflect the actual system and dynamic changes.. Therefore, the use of advanced machine learning techniques and optimization algorithms can improve the accuracy of forecasts by accounting for these complexities and identifying the optimal combination of hyperparameters for each model. This project aims to provide decision-makers with better information to make informed decisions and improve the overall forecasting process.
In this project, we followed a systematic approach to obtain and process the required oil well data. Initially, we extracted the necessary data from the raw_data.xlsx
file. Next, we conducted exploratory data analysis (EDA) to better understand the characteristics of the data. The resulting dataset was then saved with the name F_14.csv,
which includes production and injection data that significantly affect the well's production. We performed data cleaning on F_14.csv
using our custom module, odc.py,
and saved the cleaned data to a file named cleaned_F_14.csv.
After cleaning the data, we used it as input to our proposed and reference models for forecasting. All datasets used in this project can be found in the /datasets
folder, and the EDA and data cleaning process can be found in the /data_preprocessing
folder. Finally, the modeling process, including the proposed and reference models, is documented in the /modeling
folder.
├── .gitignore
├── requirements.txt
├── README.md
├── license
├── data_preprocessing
│ ├── eda
│ └── data_cleaning
├── datasets
│ ├── raw_dat.xlsx
│ ├── F_14.csv
│ └── cleaned_F_14.csv
└── modeling
├── proposed_model
│ ├── GA-TCN-LSTM_model.ipynb
└── reference_models
├── GRU_model.ipynb
├── LSTM_model.ipynb
├── RNN_model.ipynb
├── TCN_model.ipynb
Model | RMSE, bbl | wMAPE, % | MAE, bbl | R2 score |
---|---|---|---|---|
GA-TCN-LSTM | 199.39 | 5.13 | 117.11 | 0.93 |
TCN | 213.22 | 5.36 | 122.72 | 0.92 |
LSTM | 216.00 | 5.84 | 133.52 | 0.91 |
GRU | 209.33 | 5.48 | 125.06 | 0.92 |
RNN | 214.71 | 5.66 | 129.36 | 0.92 |
Notebook | Colab Link |
---|---|
Exploratory Data Analysis | |
Data Cleaning | |
Proposed Model (GA-TCN-LSTM) | |
Reference Model (TCN) | |
Reference Model (LSTM) | |
Reference Model (GRU) | |
Reference Model (RNN) |
The raw_data.xlsx
file used in this project was provided by Equinor (formerly known as Statoil) and is available on their website as part of their Volve Field Data sharing initiative.
To access the dataset, please visit the following link: https://www.equinor.com/energy/volve-data-sharing. Once on the page, select the "Go to the Volve dataset: data.equinor.com" option and follow the instructions to obtain the raw_data.xlsx
file.
Please note that the raw data is subject to the terms and conditions outlined on the Equinor website.
=======
- Simple Genetic Algorithm From Scratch in Python
- Multivariate Time Series Forecasting with LSTMs in Keras
If you have any questions or encounter any issues running this project, please feel free to open an issue. I'll be happy to help!