/GA-TCN-LSTM

An Ensemble DL Model Tuned with Genetic Algorithm for Oil Production Forecasting.

Primary LanguageJupyter NotebookMIT LicenseMIT

Multivariate Time Series Forecasting of Oil Production Based on Ensemble Deep Learning and Genetic Algorithm

This project focuses on developing a forecasting model for oil production using advanced machine learning techniques and optimization algorithms. The project includes the development of a Genetic Algorithm- Temporal Convolutional Neural Network- Long Short-Term Memory (GA-TCN-LSTM) ensemble model, as well as benchmarking against conventional models such as Recurrent Neural Network (RNN), Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), and Temporal Convolutional Network (TCN). For more details on the methodology and techniques used in this project, please read the preprint paper.

Additionally, the project includes exploratory data analysis and data cleaning using our custom-built odc module. The module is designed to detect, visualize, and treat outliers in oil well datasets and is readily available in the odc repository.

Motivation

Oil production forecasting is a critical task for many oil and gas companies, governments, and policy-makers. Accurate forecasts are essential for planning and decision-making, such as determining production rates, managing inventory, and estimating future revenue.

Conventional oil production forecasting methods have limitations due to complex data, high uncertainty, and failure to reflect the actual system and dynamic changes.. Therefore, the use of advanced machine learning techniques and optimization algorithms can improve the accuracy of forecasts by accounting for these complexities and identifying the optimal combination of hyperparameters for each model. This project aims to provide decision-makers with better information to make informed decisions and improve the overall forecasting process.

Workflow

In this project, we followed a systematic approach to obtain and process the required oil well data. Initially, we extracted the necessary data from the raw_data.xlsx file. Next, we conducted exploratory data analysis (EDA) to better understand the characteristics of the data. The resulting dataset was then saved with the name F_14.csv, which includes production and injection data that significantly affect the well's production. We performed data cleaning on F_14.csv using our custom module, odc.py, and saved the cleaned data to a file named cleaned_F_14.csv.

After cleaning the data, we used it as input to our proposed and reference models for forecasting. All datasets used in this project can be found in the /datasets folder, and the EDA and data cleaning process can be found in the /data_preprocessing folder. Finally, the modeling process, including the proposed and reference models, is documented in the /modeling folder.

Directory Tree

├── .gitignore
├── requirements.txt
├── README.md
├── license
├── data_preprocessing
│   ├── eda
│   └── data_cleaning
├── datasets
│   ├── raw_dat.xlsx
│   ├── F_14.csv
│   └── cleaned_F_14.csv
└── modeling
    ├── proposed_model
    │   ├── GA-TCN-LSTM_model.ipynb
    └── reference_models
        ├── GRU_model.ipynb
        ├── LSTM_model.ipynb
        ├── RNN_model.ipynb
        ├── TCN_model.ipynb

Evaluation

Model RMSE, bbl wMAPE, % MAE, bbl R2 score
GA-TCN-LSTM 199.39 5.13 117.11 0.93
TCN 213.22 5.36 122.72 0.92
LSTM 216.00 5.84 133.52 0.91
GRU 209.33 5.48 125.06 0.92
RNN 214.71 5.66 129.36 0.92

GA-TCN-LSTM actual and predicted values on training and testing sets

View Notebooks in Colab

Notebook Colab Link
Exploratory Data Analysis Open In Colab
Data Cleaning Open In Colab
Proposed Model (GA-TCN-LSTM) Open In Colab
Reference Model (TCN) Open In Colab
Reference Model (LSTM) Open In Colab
Reference Model (GRU) Open In Colab
Reference Model (RNN) Open In Colab

Dataset

The raw_data.xlsx file used in this project was provided by Equinor (formerly known as Statoil) and is available on their website as part of their Volve Field Data sharing initiative.

To access the dataset, please visit the following link: https://www.equinor.com/energy/volve-data-sharing. Once on the page, select the "Go to the Volve dataset: data.equinor.com" option and follow the instructions to obtain the raw_data.xlsx file.

Please note that the raw data is subject to the terms and conditions outlined on the Equinor website.

Tech Stack

=======

Made with Python

Credits

License

MIT

Contact

If you have any questions or encounter any issues running this project, please feel free to open an issue. I'll be happy to help!