GeoHackathon2022

Problem Statement

To predict and pick top formation given log data. The log data given varies in density, representing real world problem where not all logs are available for analysis.

Methodology

Checking data Coverage

As not all logs are present at all times, it is imperative to check which logs are available and can be used for prediction.

Log Viewer

The logs are viewed using Cegal Tools. Simply do !pip install cegal-welltools

Dealing with Missing Data

For simplicty, all empty cells are filled with -999.25, we also will only use logs which covers 50% of wells.

There are other methods such as deriving pseudo logs, feature engineering or statistical analysis for these missing data.

Results

For simplicity, we are going to choose Random Forest Classification Algorithm with the following hyperparameters:

RandomForestClassifier( n_estimators = 300, random_state=1, max_features=12, max_depth=7, min_samples_leaf = 15)

This yielded

Future Work

As this is only first pass of creating proper model, there are a lot more work that can be done in order to improve model accuracy.

Mentioned earlier are methods such as psuedo logs, feature engineering or statistical analysis.

We can also check data distribution, upscaling or downscaling so no oversampling for each formation.

Investigating other models and searching for best hyperparameters using GridSearch or RandomSearch could also potentially improve model accuracy.

Finally, potential work using lithology-based image as input can also be done to improve formation prediction that can tie with logs and lithology.

haikalbaik/GeoHackathon2022