Rosneft_hackaton

This is Rosneft hackaton which we (I and Mikhail Vitko) have participated and secured a spot in the top 10 out of 180 teams. The hackathon took place in October 2023.

Task: Using data from geological maps and seismic survey results, develop a predictive model for unexplored areas.

Problem statement

Image

In a certain region 𝑅 on the plane, several functions 𝑀𝑎𝑝𝑖 𝑥, 𝑦 , 𝑖 = 1, … , 5 are defined by their values on a regular rectangular grid. It is known that all these functions are controlled by a set of unknown interrelated functions 𝐹𝑗 𝑥, 𝑦 , from which values of only one function 𝐹1 𝑥, 𝑦 are specified at several points (not necessarily coinciding with grid nodes). It is also known that the distributions of functions 𝐹𝑗 𝑥, 𝑦 are characterized by zonality (depend on coordinates 𝑥, 𝑦 ).

It is required to find the values of the unknown function 𝐹1 𝑥, 𝑦 at all grid nodes and assess the quality of the found approximation.

Image

Data

The data was presented in the following format, where (x, y) represents coordinates, and z is one of the seismic signals at a depth of 2 thousand kilometers.

The task was to determine the values of Z for the Point_dataset.txt table in unknown areas (x, y) using all 5 original maps.

Image

Solution

After analyzing the source data, we hypothesized that each of the MAR files contains the results of geophysical studies. The set of geophysical parameters, in turn, characterizes a specific rock type, which may be of particular interest in delineating oil-bearing contours and predicting flow rates. This assumption is supported by the fact that the distribution of functions F is characterized by zonality.

After that, our team split into two: Mikhail attempted to solve the problem using clustering and classification, while I, in turn, tried to address the issue through regression. In the future, we decided to focus on regression.

The approximate solution path can be seen further:

Image

Image

Result

As a result, the RF model with regression showed 89% accuracy on the training data (as shown in the following image), and on the final test set, it achieved 91% accuracy. At the same time, the model with clustering showed around 80% accuracy without hyperparameter tuning.

The final file main.py includes the best approach (RF) for this task.

Image