The objective of this study is to use Neural Network`s with PyTorch to classify over 12 classes a rock type.
Based on column FORCE_2020_LITHOFACIES_LITHOLOGY
• 30000: Sandstone
• 65030: Sandstone/Shale
• 65000: Shale
• 80000: Marl
• 74000: Dolomite
• 70000: Limestone
• 70032: Chalk
• 88000: Halite
• 86000: Anhydrite
• 99000: Tuff
• 90000: Coal
• 93000: Basement
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
The study is dividided in:
- Analyse data
- Baseline With Decision Tree Classifier
- Different Neural Network Architectures test
- Generate predictions of a hidden file
The analysis and Decision Tree can be found on Base_Analyze.ipynb
. This file load data and analyse based on Decision Tree algorithm with unbalanced and balanced Data.
In Unbalanced data, the result was 92.35%
and with balanced data with SMOTE (Sinthetic Minotiry Oversampling Technique) was 98.88%
.
The table below describe the tests made with Neural Network. The objective was to start with just few data and add more or change model according to the results.
Number | Model Name | Max Num Samples Per Class | SMOTE? | Num Epochs | Train Acc | Train Loss | Val Acc | Val Loss | Test Precision | Test Recall | Test F1-Score |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | single_layer | 10000 | False | 100 | 77.390% | 0.65312 | 74.709% | 0.68731 | 0.79 | 0.75 | 0.76 |
2 | multi_layer | 10000 | False | 100 | 72.729% | 0.84311 | 72.398% | 0.75672 | 0.76 | 0.72 | 0.73 |
3 | multi_layer | 50000 | False | 1000 | 59.464% | 1.14423 | 64.466% | 1.02543 | 0.67 | 0.64 | 0.63 |
4 | multi_layer_relu | 10000 | False | 100 | 88.023% | 0.39305 | 87.399% | 0.40415 | 0.88 | 0.87 | 0.87 |
5 | multi_layer_relu_batch_norm | 10000 | False | 100 | 92.029% | 0.21409 | 91.749% | 0.19964 | 0.92 | 0.92 | 0.92 |
6 | multi_layer_relu_batch_norm | 10000 | True | 100 | 93.801% | 0.17439 | 94.958% | 0.13612 | 0.95 | 0.95 | 0.95 |
7 | multi_layer_3_relu_batch_norm_dropout | 20000 | True | 300 | 69.476% | 0.75197 | 72.979% | 0.60935 | 0.74 | 0.73 | 0.72 |
-
Number: Test Number
-
Model Name: Name of the model
-
Max Num Samples Per Class: The maximum number of classes per class for all, train/valid/test
-
SMOTE? Boolean that indicates if the test used augmented data or not
-
Num Epochs: Number of epochs
-
Train Acc: Last epoch train accuracy
-
Train Loss: Last epoch train loss
-
Val Acc: Last epoch valid accuracy
-
Val Loss: Last epoch valid loss
-
Test Precision: Weighted test precision
-
Test Recall: Weighted test recall
-
Test F1-Score: Weighted test F1-Score
Important: Some of models were exectued more than once and some values may vary a little.
Generate predictions of a hidden file
As the result above, we save 2 models in data/model
folder:
- Test 5 with 92% of test precision (unbalanced data)
- Test 6 with 95% of test precision (balanced data)
The inferences were executed in Inference_Model.ipynb
file. The result generates are in data/output
folder.
In this homework I could learn more about PyTorch, unbalanced and balanced data and multiple kind of test. Unfortunatelly, for a higher results, more data should be added in some classes to balance and with more analysis on each variable, some of them could be removed (or outliers and others).