WIP - CRISM-MLDAL - Machine Learning Data Analysis Tool

Scripts to:

read CRISM MTRDR images,
extract spectral indexes (bands),
extract features and classes,
train the assemble features and mineral classes
compare trained models predictions

Brief description.

What is a MTRDR? A map-projected Targeted Reduced Data Record (MTRDR) consists of the TER corrected I/F spectral information after map projection and the removal of spectral channels with suspect radiometry (“bad bands”). See https://bit.ly/2IuEkIT for detailed informations.

MTRDR are available for download from MARS Orbital Data Explorer website: https://bit.ly/339xJNF

Downloaded images consist of different files, the most important are the *.img (data file) and *.hdr (ancillary metadata file).

This repository is composed by:

Conditioned ML: containing script to train and predict models on the same dataset (only for testing purposes and to get a bit familiar with ML)
Tools:
- MTRDR containing scripts to:
  - convert CRISM hypercube to numpy 3D array and save it in hdf file
  - filter and combine MTRDR and Summary Products hdf files
- Summary Products containing scripts to:
  - Extract unique bands (indexes) from Summary Products datasets into multiple filetypes
  - Create original and boolean files of user selected bands(indexes) in hdf file format

Getting Started

Prerequisites

To run both of the scripts the following modules are needed to be installed:

Rasterio
spectral
numpy
cv2
pandas

A requirements.txt files is also provided.

If you are using conda

conda install -c conda-forge rasterio
conda install -c conda-forge spectral
conda install -c conda-forge opencv
conda install -c conda-forge

Installing

To install it, just clone the repository and double check the prerequisites.

Conditioned ML Workflow detailed description

Data organization

Datasets download from Mars Orbital Data Explorer should be extracted into separated folders, named as the dataset and that contain all files.

Using the various scripts, different sub-folders will be created automatically

Dataset folder contains:
- Extracted - contains all the files generated by Multiple_Band_extractor and:
  - Processed - files generated by Features-Target_creator and:
  - Bandname_models folder for each band processed that contains DecisionTree and RandomForest models plus some evaluation datas

Multiple_Band_extractor

Given a dataset folder it reads all the img files and their associated hdr file that are necessary to read correctly the BAND_NAME of the img.

To avoid duplicates, if a band is already extracted, it will be skipped. A threshold based filter is applied to each band, to remove datas under a certain values. This value si computed for each image and correspond to a nTh percentile (default 20th percentile but user can input different value). This percentile is calculated excluding the NaN values present. Then original images are saved into a sub-folder called "Extracted" as:

dump file for original
numpy array (*.npy) for thresholded
image file (*.png) for both original and thresholded
In addition a csv file, containing all the unique band names in a single column is generated.

Features-Target_creator

Users must select:

the "Extracted" folder
a file that contains the unique band names that wiill be converted into features and target.
- Can be selected the csv file created by Multiple_Band_extractor script
- Can be selected a user personal csv containing only the desired bands. Be aware that unique band names must be exactly the same band names extracted. e.g. OLINDEX3 band must be OLINDEX3 in the csv, cannot be Olindex3, or similar. (Could be improved in further developments)
Two files are generated and saved into a sub-folder "Processed"
dataset-name_features: a csv file containing all the selected bands and the respective values in columns dataset-name_classes: a csv file containing all the selected bands and the respective boolean values in columns.

ML_Multiple-single_class_trainer

Users must select:

the "Processed" folder
the same csv file that contains the unique band names
the dataset-name_features
the dataset-name_features

The script will then:

create a folder with the band names
train specific models for each band included into the features/classes files
compute for each model:
- the confusion matrix for Decision Trees and Random Forest: plot
- the classification report for Decision Trees and Random Forest: plot and save
- the feature score combined graph for Decision Trees and Random Forest: plot and save
- tree graph for Decision Trees

Multi-model_single-target_predictor

Users must select:

the folder where are located the files (pkl) of model pairs (Decision Trees and Random Forest). Minimum 1 pair, maximum to be tested (i have tried up to 5 model)
the dataset_name_features generated by Features-Target_creator with the same amout of bands

The script will then:

compute predictions for each given model
generate combined graph showing the results of the prediction

WIP - ML on hyperspectral cubes

ML_single_class_full_bands_combined_df

Read a combined MTRDR + index dataframes and train models:

Decision Trees
Random Forest

Multi-model_single-target_predictor

Read trained models and predict a selected MTRDR

Detailed description

Description of Specialized Browse Product Mosaics

Original datasets are Map-Projected Targeted Reduced Data Records (MTRDR) contain TER data map-projected using terrain models of the Martian surface.

Downloaded from Mars Orbital Data Explorer portal (https://ode.rsl.wustl.edu/mars/) and organized into relative folders.

Those datasets contains Spectral summary parameters thate are band math calculations that quantify diagnostic or indicative spectral structure and collectively capture the mineralogical diversity of the surface. Mathematical functions applied to reflectance values at key wavelengths allow the relative depth of particular absorption features (for example) to be quantified, and produce grayscale images indicating the presence or absence of particular phases.

A more detailed description of original datasets could be find here: http://crism.jhuapl.edu/msl_landing_sites/index_news.php

Processed dataset were all created with SpIdx_to_Dict.py script on specific original datasets to extract user-selected Spectral summary parameters in different formats.

Authors

Giacomo Nodjoumi - Initial work - Hyradus
Carlos H Brandt - contributors - chbrandt

License

GNU General Public License v3.0

Hyradus/CRISM-MLDAL

WIP - CRISM-MLDAL - Machine Learning Data Analysis Tool

Brief description.

Getting Started

Prerequisites

Installing

Conditioned ML Workflow detailed description

Data organization

Multiple_Band_extractor

Features-Target_creator

ML_Multiple-single_class_trainer

Multi-model_single-target_predictor

WIP - ML on hyperspectral cubes

ML_single_class_full_bands_combined_df

Multi-model_single-target_predictor

Detailed description

Description of Specialized Browse Product Mosaics

Authors

License