/QMineral_Modeller

Qmin is a tool built to help with the EPMA processing and analysis workflow.

Primary LanguagePython

Preprint: DOI:10.21203/rs.3.rs-629516/v1 Published: DOI:10.1016/j.cageo.2021.104949

jpeg

Introduction

This is the Qmin - Mineral Chemistry Virtual Assistant. The models herein presented perform mineral classification, missing value imputation by multivariate regression and mineral formula prediction by several Random Forest classification and regression nested models.

The models have been developed by researchers of the Geological Survey of Brazil (SGB/CPRM), with the assistance of the technical manager of the EPMA laboratory of the Institute of Geosciences/University of Brasília (IG/UnB).

Documentation

Additional information about the building process is available on the internet in the preprint file (original manuscript still not certified by a peer review), or in the published version of our work, available at the journal Computers and Geosciences.

You can also watch the presentation (only in Portuguese) for the release of the Beta version of the application.

Important Notes

⚠️ This model is in active development and so parameter names and behaviors, and output file formats will change without notice.

⚠️ The model is stochastic. Multiple runs with different seeds (or random state) should be undertaken to see average behavior.

⚠️ The quality of the prediction is directly dependent on the quality of the entry data. Consider the best practice to input data with the sum of elements concentration between 98-102%

⚠️ At the current state, Qmin is able to predict among 19 groups and 102 different minerals. Any other mineral not listed below will not perform as desired:

  • AMPHIBOLES (13 minerals): ACTINOLITE, ARFVEDSONITE, CUMMINGTONITE, EDENITE, HASTINGSITE, HORNBLENDE (SENSU LATO), KAERSUTITE, KATOPHORITE, MAGNESIOHASTINGSITE, PARGASITE, RICHTERITE, RIEBECKITE, TREMOLITE.

  • APATITE: APATITE (SENSU LATO)

  • CARBONATES (13 minerals): ANCYLITE, ANKERITE, BURBANKITE, CALCITE, CARBOCERNAITE, DOLOMITE, GREGORYITE, KUKHARENKOITE (SENSU LATO), KUTNAHORITE, MAGNESITE, NATROFAIRCHILDITE/NYEREREITE/ZEMKORITE, SHORTITE, SIDERITE

  • CHLORITE: CHLORITE (SENSU LATO) ⚠️ STILL UNSTABLE! ⚠️

  • CLAY-MINERALS (5 minerals): BEIDELLITE, CORRENSITE, ILLITE, MONTMORILLONITE, SAPONITE

  • EPIDOTE: EPIDOTE (SENSU LATO) ⚠️ STILL UNSTABLE! ⚠️

  • FELDSPARS (8 minerals): ALBITE, ANDESINE, ANORTHITE, ANORTHOCLASE, BYTOWNITE, K-FELDSPAR, LABRADORITE, OLIGOCLASE

  • FELDSPATHOIDS (8 minerals): ANALCIME, CANCRINITE, HAUYNE, LEUCITE, NEPHELINE, NOSEAN, TRIKALSILITE/KALSILITE/KALIOPHILITE/PANUNZITE, SODALITE

  • GARNETS (5 minerals): ALMANDINE, ANDRADITE, GROSSULAR, PYROPE, SCHORLOMITE

  • ILMENITE

  • MICAS (6 minerals): BIOTITE (SENSU LATO), CELADONITE, MUSCOVITE, PARAGONITE, YANGZHUMINGITE, ZINNWALDITE (SENSU LATO)

  • OLIVINES (3 minerals): FAYALITE, FORSTERITE, MONTICELLITE

  • PEROVSKITE

  • PYROXENES (9 minerals): AEGIRINE, AUGITE, DIOPSIDE, ENSTATITE/CLINOENSTATITE, FERROSILITE/CLINOFERROSILITE, HEDENBERGITE, OMPHACITE, PIGEONITE, TITAN-AUGITE

  • QUARTZ

  • SPINELS (5 minerals): CHROMITE, HERCYNITE, MAGNETITE, SPINEL, ULVOSPINEL

  • SULFIDES (18 minerals): ALABANDITE, ARSENOPYRITE, BORNITE, CHALCOCITE, CHALCOPYRITE, CHLORBARTONITE, CUBANITE/ISOCUBANITE, GALENA, HEAZLEWOODITE, MACKINAWITE, PENTLANDITE, POLYDYMITE, PYRITE, PYRRHOTITE, RASVUMITE, SPHALERITE, STROMEYERITE

  • TITANITE

  • ZIRCON

Mineral Formula Calculation

Mineral Formula Calculation by Deterministic Approach

The mineral formulas here implemented for Feldspar, Garnet, Mica, Olivine, Pyroxene and Spinel were calculated based on EPMA data and the total content of Fe3+ was obtained, when possible, by the charge balance after the calculation of atom per formula unit number. Then, the formula printed out in the output is the product of several calculations concatenated into a string datatype column.

Mineral Formula Calculation by Probabilistic Approach

The calculation formula for Amphiboles will be made by a multivariate regression for each one of the Crystallographic Sites, still in development, and will later be made available in this repository.

⚠️ THE MINERAL FORMULA CALCULATION FOR AMPHIBOLES IS UNSTABLE, AND WE TURN IT DOWN. This is because we have reported underestimation of Fe3+ and Fe2+ in the final formula. ⚠️

Status

This model is in active development and subject to significant code changes to:

  • Increase the number of groups and minerals covered
  • Improve performance
  • Increase the size of samples used for training

Training Data

The directory data_raw contains all raw data considered for the models' building. The main source of the data used for training is the GEOROC database. The repository GEOROC is maintained by the Max Planck Institute for Chemistry in Mainz.

Some other data used in this work are a concession of researchers of the Geological Survey of Brazil and was used for the model's test and calibration. Those are available in the folder OtherSources.

Building:

Project Developed on R and Python3 languages.

The data wrangling, first missing value imputaion, conversion elements to oxides, and balancing of mineral instances was done in R. The code is available in the Code_R folder.

The final models used in this work were developed in the Python3 language, and are available in the model_py folder. All python codes are available in the Code_Python folder.

Contributors

Copyright and License

The source code for Qmin is licensed based on the BSD 3-Clause License, see LICENSE.