/Practice_Machine-Learning-Competition-2020

SPWLA PDDA’s 1st Petrophysical Data-Driven Analytics Contest -- Sonic Log Synthesis

Primary LanguageJupyter Notebook

PDDA-Machine-Learning-Competition-2020

SPWLA PDDA’s 1st Petrophysical Data-Driven Analytics Contest -- Sonic Log Synthesis

Sponsored by:

SPWLA-PDDA

SparkCognition

Summary Paper:

Competition Summary - Pseudo Sonic Log Generation

Winner:

Winner Team Contact
1st Place UTFE Wen Pan(wenpan@utexas.edu)
Tianqi Deng(tianqizx@utexas.edu)
Honggeun Jo(honggeun.jo@utexas.edu)
Javier Santos(jesantos@utexas.edu)
2nd Place iwave Lei Fu(lei.fu.rice@gmail.com)
3rd Place RockAbusers Arkhat Kalbekov( akalbekov@mines.edu)
Valeria Suarez(vasuarezbolivar@mymail.mines.edu)
4th Place StuckAtHome
5th Place SedStrat Epo Prasetya Kusumah( epo.pk@universitaspertamina.ac.id)
Mohammad Aviandito(aviandito@gmail.com)
Yogi Pamadya(yogipamadya@gmail.com)

Leaderboard

Root mean squared error(RMSE) is calculated from the DTC and DTS values of the hidden dataset.

equation

Rank Team Name Best Score Best Solution Notebook
1 UTFE 12.35942 Neural Network Notebook
2 iwave 12.55189 LSTM Notebook
3 RockAbusers 13.2136 Randomforest Notebook
4 StuckAtHome 13.43166 TreeEnsemble Notebook
5 SedStrat 13.84585 Ensemble Notebook
6 RocketTeam 14.83064 LSTM Notebook
7 iPetro 15.38718 Neural Network Notebook
8 Oilers 15.75537 XGBoost Notebook
9 DataDrivenPancakes 16.31731 Ensemble Notebook
10 TeamTriumphant 16.41215 LGBM Notebook
11 TheMeanSquares 16.60382 LGBM Notebook
12 Explorum 16.70458 Randomforest Notebook
13 MSArchie 16.9674 Ensemble Notebook
14 MLogging 16.98075 Ensemble Notebook
15 TrashPandas 17.27522 Tree Ensemble Notebook
16 TeamTF 17.47539 Tree Ensemble Notebook
17 PDDA 17.92553 Randomforest Starter_Yu.ipyb
18 UNDFightingHawks 20.23271 Randomforest Notebook
19 DoaIbu 20.34702 Ensemble Notebook
20 TensorITB 23.92497 MultiOutputRegressor Notebook
Synergy 14.28895
LACrew 15.61239
DATUM 15.93848
Curiosity 15.96676
Diagenesis 16.58438
SubsurfaceIntelligence 16.92818
Colonels 17.22655
HoustonEnergyTeam 17.30373
TeamCGG 17.38406
IIT Roorkee 19.12469
GUCoders 22.91161

Background

Well logs are interpreted/processed to estimate the in-situ petrophysical and geomechanical properties, which is essential for subsurface characterization. Various types of logs exist, and each provides distinct information about subsurface properties. Certain well logs, like gamma ray (GR), resistivity, density, and neutron logs, are considered as “easy-to-acquire” conventional well logs that are run in most of the wells. Other well logs, like nuclear magnetic resonance, dielectric dispersion, elemental spectroscopy, and sometimes sonic logs, are only run in limited number of wells.

Sonic travel-time logs contain critical geomechanical information for subsurface characterization around the wellbore. Often, sonic logs are required to complete the well-seismic tie workflow or geomechanical properties prediction. When sonic logs are absent in a well or an interval, a common practice is to synthesize them based on its neighboring wells that have sonic logs. This is referred to as sonic log synthesis or pseudo sonic log generation.

Problem Statement

Compressional travel-time (DTC) and shear travel-time (DTS) logs are not acquired in all the wells drilled in a field due to financial or operational constraints. Under such circumstances, machine learning techniques can be used to predict DTC and DTS logs to improve subsurface characterization. The goal of the “SPWLA’s 1st Petrophysical Data-Driven Analytics Contest” is to develop data-driven models by processing “easy-to-acquire” conventional logs from Well #1, and use the data-driven models to generate synthetic compressional and shear travel-time logs (DTC and DTS, respectively) in Well #2. A robust data-driven model for the desired sonic-log synthesis will result in low prediction errors, which can be quantified in terms of Root Mean Squared Error(RMSE) by comparing the synthesized and the original DTC and DTS logs.

You are provided with two datasets: Well #1 dataset and Well #2 dataset. You need to build a generalizable data-driven models using Well #1 dataset. Following that, you will deploy the newly developed data-driven models on Well #2 dataset to synthesize DTS and DTC logs. The data-driven model should use feature sets derived from the following seven logs: Caliper, Neutron, Gamma Ray, Deep Resistivity, Medium Resistivity, Photo-electric factor and density. The data-driven model should synthesize two target logs: DTC and DTS logs.

About us

Petrophysical Data-Driven Analytics (PDDA), a special interest group under society of Petrophysicists and Well Log Analysts (SPWLA), is announcing its first machine learning contest in 2020! The contest is open to all SPWLA members (including student members) or whoever are interested in machine learning applications in petrophysics.

Competition Timeline

Start Date: March 1, 2020

Team Registration Deadline: March 31, 2020 11:59 PM CST

Entry Deadline: April 30, 2020 11:59 PM CST

End Date (Final Submission of Code Deadline): May 7, 2020 11:59 PM CST

Registration

Please send your team name, team member, contact info, and affiliation to pdda_sig@spwla.org. The official competition website is https://github.com/pddasig/Machine-Learning-Competition-2020.

One account per participant

You cannot register from multiple accounts and therefore you cannot submit from multiple accounts.

Team Limits

The maximum team size is 5.

Submission

Your submission needs to follow the same format as the ‘sample_submission.csv’ file provided on the competition website, the final ranking is based on the RMSE score of the hidden dataset.

A blind test dataset from 20% of the hidden dataset is released for the your judgement, you may check your model performance based on this dataset as many times as you want. This dataset will be released after the registration deadline.

Please note that the purpose of the released dataset is providing a validation tool to check the performance of your model. However, in the real application there would be no such data, since we will not have any access to the new well's data. Therefore please do not use the data to train your model.

You may select up to 3 submissions for judging before the entry deadline, the highest score will be used for your rank. You must submit your runnable code in a Notebook/JupyterNotebook format before the end date, any code submission with sever bugs or results in a different number from the data entry will not be ranked or awarded.

** Please make sure to use "random_state" or "SEED" for all the steps that involves randomization in your model, this will ensure the same result run by the judges.

Privacy Rules

Privately sharing code or data outside of teams is not permitted. It's okay to share code if made available to all participants on the competition Github repository.

You should NOT use any dataset during the training other than the one provided by the committee.

Any violation of the above will be regarded as cheating and not ranked or awarded.

Competition Specific Rules

COMPETITION TITLE: Pseudo Sonic Log Generation

COMPETITION ORGANIZOR: SPWLA – PDDA SIG

COMPETITION WEBSITE: https://github.com/pddasig/Machine-Learning-Competition-2020

You can submit "Issues" ticket to the repository if you find any problem of the compeition or would like to raise a discussion topic.

Prize Policy:

Total award: $1500

Rank Prize
1st Place $500
2nd Place $400
3rd Place $300
4th Place $200
5th Place $100

Top 5 winning teams will be awarded with prizes(NOT in cash).

Novel and practical algorithms will be recommended for a submission to the next SPWLA special issue by PDDA.

Data Licensing

The data comes from VOLVE dataset owned by Equinor.

DATA ACCESS AND USE: Creative Commons Attribution-NonCommercial-ShareAlike license.

ENTRY IN THIS COMPETITION CONSTITUTES YOUR ACCEPTANCE OF THESE OFFICIAL COMPETITION RULES.

The Competition named above is a skills-based competition to promote and further the field of data science. You must submit your registration to pdda_sig@spwla.org to enter. Your competition submissions ("Submissions") must conform to the requirements stated on the Competition Website. Your Submissions will be scored based on the evaluation metric described on the Competition Website. Subject to compliance with the Competition Rules, Prizes, if any, will be awarded to participants with the best scores, based on the merits of the data science models submitted. Check the competition website for the complete Competition Rules.

SPWLA PDDA SIG Contest Committee:

Yanxiang Yu, Chicheng Xu, Siddharth Misra, Weichang Li, Michael Ashby, Brendon Hall, Yan Xu, Oghenekaro Osogba