/3W

Promotes development of ML algorithms for early detection and classification of undesirable events in offshore oil wells.

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Apache 2.0 CC BY 4.0 Code style Versioning

Table of Content

Introduction

This is the first repository published by Petrobras on GitHub. It supports the 3W project, which aims to promote experimentation and development of Machine Learning-based approaches and algorithms for specific problems related to detection and classification of undesirable events that occur in offshore oil wells.

The 3W project is based on the 3W dataset, a database described in this paper, and on the 3W toolkit, a software package that promotes experimentation with the 3W dataset for specific problems. The name 3W was chosen because this dataset is composed of instances from 3 different sources and which contain undesirable events that occur in oil Wells.

Motivation

Timely detection of undesirable events in oil wells can help prevent production losses, reduce maintenance costs, environmental accidents, and human casualties. Losses related to this type of events can reach 5% of production in certain scenarios, especially in areas such as Flow Assurance and Artificial Lifting Methods. In terms of maintenance, the cost of a maritime probe, required to perform various types of operations, can exceed US $500,000 per day.

Creating a dataset and making it public to be openly experienced can greatly foment the development of tools that can:

  • Improve the process of identifying undesirable events in offshore wells production;
  • Increase the efficiency of monitoring the integrity of wells and subsea systems, whose related problems can generate invaluable losses for people, environment, and company's image.

Strategy

The 3W is the pilot of a Petrobras' program called Conexões para Inovação - Módulo Open Lab. This pilot is an open project composed by two major resources:

  • The 3W dataset, which will be evolved and supplemented with more instances from time to time;
  • The 3W toolkit, which will also be evolved (in many ways) to cover an increasing number of undesirable events during its development.

Therefore, our strategy is to make these resources publicly available so that we can develop the 3W project with a global community collaboratively.

Ambition

With this project, Petrobras intends to develop (fix, improve, supplement, etc.):

  • The 3W dataset itself;
  • The 3W toolkit itself;
  • Approaches and algorithms that can be incorporated into systems dedicated to monitoring undesirable events in offshore oil wells during their respective production phases;
  • Tools that can be useful for our ambition.

Contributions

We expect to receive various types of contributions from individuals, research institutions, startups, companies and partner oil operators.

Before you can contribute to this project, you need to read and agree to the following documents:

It is also very important to know, participate and follow the discussions. See the discussions section.

Licenses

All the code of this project is licensed under the Apache 2.0 License and all 3W dataset data files (CSV files in the subdirectories of the dataset directory) are licensed under the Creative Commons Attribution 4.0 International License.

Versioning

In the 3W project, three types of versions will be managed as follows.

  • Version of the 3W toolkit: specified in the init.py file;
  • Version of the 3W dataset: specified in the dataset.ini file;
  • Version of the 3W project: specified with tags in the git repository;
  • We will exclusively use the semantic versioning defined in https://semver.org;
  • Versions will always be updated manually;
  • Versioning of the 3W toolkit and 3W dataset are completely independent of each other;
  • The version of the 3W project will be updated whenever, and only when, there is a new commit in the main branch of the repository, regardless of the updated resource: 3W toolkit, 3W dataset, project documentation, example of use, etc;
  • We will only use annotated tags and for each tag there will be a release in the remote repository (GitHub);
  • Content for each release will be automatically generated with functionality provided by GitHub.

Questions

See the discussions section. If you don't get clarification, please open discussions to ask your questions so we can answer them.

3W dataset

To the best of its authors' knowledge, this is the first realistic and public dataset with rare undesirable real events in oil wells that can be readily used as a benchmark dataset for development of machine learning techniques related to inherent difficulties of actual data. For more information about the theory behind this dataset, refer to the paper A realistic and public dataset with rare undesirable real events in oil wells published in the Journal of Petroleum Science and Engineering (link here).

Structure

The 3W dataset consists of all CSV files in the subdirectories of the dataset directory and structured as detailed here.

Overview

A 3W dataset's general presentation with some quantities and statistics is available in this Jupyter Notebook.

3W toolkit

The 3W toolkit is a software package written in Python 3 that contains resources that make the following easier:

  • 3W dataset overview generation;
  • Experimentation and comparative analysis of Machine Learning-based approaches and algorithms for specific problems related to undesirable events that occur in offshore oil wells during their respective production phases;
  • Standardization of key points of the Machine Learning-based algorithm development pipeline.

It is important to note that there are arbitrary choices in this toolkit, but they have been carefully made to allow adequate comparative analysis without compromising the ability to experiment with different approaches and algorithms.

Structure

The 3W toolkit is implemented in sub-modules as discribed here.

Incorporated Problems

Specific problems will be incorporated into this project gradually. At this point, we can work on:

All specification is detailed in the CONTRIBUTING GUIDE.

Examples of Use

The list below with examples of how to use the 3W toolkit will be incremented throughout its development.

  • 3W dataset's overviews:
  • Binary classifier of Spurious Closure of DHSV:

For a contribution of yours to be listed here, follow the instructions detailed in the CONTRIBUTING GUIDE.

Reproducibility

For all results generated by the 3W toolkit to be consistent, we recommend you create and use a virtual environment with the packages versions specified in the environment.yml, which was generated with conda. First you have to install the Anaconda. Then open an Anaconda Prompt, make sure the current directory is the directory where you have the 3W and run the following commands as needed:

$ conda env create -f environment.yml
  • To activate the created virtual environment:
$ conda activate 3W
  • To use the 3W toolkit resources interactively:
$ python
  • To initialize a local Jupyter Notebook server:
$ jupyter notebook