Scripts to:
- read CRISM MTRDR images,
- extract spectral indexes (bands),
- extract features and classes,
- train the assemble features and mineral classes
- compare trained models predictions
What is a MTRDR? A map-projected Targeted Reduced Data Record (MTRDR) consists of the TER corrected I/F spectral information after map projection and the removal of spectral channels with suspect radiometry (“bad bands”). See https://bit.ly/2IuEkIT for detailed informations.
MTRDR are available for download from MARS Orbital Data Explorer website: https://bit.ly/339xJNF
Downloaded images consist of different files, the most important are the *.img (data file) and *.hdr (ancillary metadata file).
This repository is composed by:
- Conditioned ML: containing script to train and predict models on the same dataset (only for testing purposes and to get a bit familiar with ML)
- Tools:
- MTRDR containing scripts to:
- convert CRISM hypercube to numpy 3D array and save it in hdf file
- filter and combine MTRDR and Summary Products hdf files
- Summary Products containing scripts to:
- Extract unique bands (indexes) from Summary Products datasets into multiple filetypes
- Create original and boolean files of user selected bands(indexes) in hdf file format
- MTRDR containing scripts to:
To run both of the scripts the following modules are needed to be installed:
- Rasterio
- spectral
- numpy
- cv2
- pandas
A requirements.txt files is also provided.
If you are using conda
conda install -c conda-forge rasterio
conda install -c conda-forge spectral
conda install -c conda-forge opencv
conda install -c conda-forge
To install it, just clone the repository and double check the prerequisites.
Datasets download from Mars Orbital Data Explorer should be extracted into separated folders, named as the dataset and that contain all files.
Using the various scripts, different sub-folders will be created automatically
- Dataset folder contains:
- Extracted - contains all the files generated by Multiple_Band_extractor and:
- Processed - files generated by Features-Target_creator and:
- Bandname_models folder for each band processed that contains DecisionTree and RandomForest models plus some evaluation datas
- Extracted - contains all the files generated by Multiple_Band_extractor and:
Given a dataset folder it reads all the img files and their associated hdr file that are necessary to read correctly the BAND_NAME of the img.
To avoid duplicates, if a band is already extracted, it will be skipped. A threshold based filter is applied to each band, to remove datas under a certain values. This value si computed for each image and correspond to a nTh percentile (default 20th percentile but user can input different value). This percentile is calculated excluding the NaN values present. Then original images are saved into a sub-folder called "Extracted" as:
- dump file for original
- numpy array (*.npy) for thresholded
- image file (*.png) for both original and thresholded
- In addition a csv file, containing all the unique band names in a single column is generated.
Users must select:
- the "Extracted" folder
- a file that contains the unique band names that wiill be converted into features and target.
- Can be selected the csv file created by Multiple_Band_extractor script
- Can be selected a user personal csv containing only the desired bands. Be aware that unique band names must be exactly the same band names extracted. e.g. OLINDEX3 band must be OLINDEX3 in the csv, cannot be Olindex3, or similar. (Could be improved in further developments)
- Two files are generated and saved into a sub-folder "Processed"
- dataset-name_features: a csv file containing all the selected bands and the respective values in columns dataset-name_classes: a csv file containing all the selected bands and the respective boolean values in columns.
Users must select:
- the "Processed" folder
- the same csv file that contains the unique band names
- the dataset-name_features
- the dataset-name_features
The script will then:
- create a folder with the band names
- train specific models for each band included into the features/classes files
- compute for each model:
- the confusion matrix for Decision Trees and Random Forest: plot
- the classification report for Decision Trees and Random Forest: plot and save
- the feature score combined graph for Decision Trees and Random Forest: plot and save
- tree graph for Decision Trees
Users must select:
- the folder where are located the files (pkl) of model pairs (Decision Trees and Random Forest). Minimum 1 pair, maximum to be tested (i have tried up to 5 model)
- the dataset_name_features generated by Features-Target_creator with the same amout of bands
The script will then:
- compute predictions for each given model
- generate combined graph showing the results of the prediction
Read a combined MTRDR + index dataframes and train models:
- Decision Trees
- Random Forest
Read trained models and predict a selected MTRDR
Original datasets are Map-Projected Targeted Reduced Data Records (MTRDR) contain TER data map-projected using terrain models of the Martian surface.
Downloaded from Mars Orbital Data Explorer portal (https://ode.rsl.wustl.edu/mars/) and organized into relative folders.
Those datasets contains Spectral summary parameters thate are band math calculations that quantify diagnostic or indicative spectral structure and collectively capture the mineralogical diversity of the surface. Mathematical functions applied to reflectance values at key wavelengths allow the relative depth of particular absorption features (for example) to be quantified, and produce grayscale images indicating the presence or absence of particular phases.
A more detailed description of original datasets could be find here: http://crism.jhuapl.edu/msl_landing_sites/index_news.php
Processed dataset were all created with SpIdx_to_Dict.py script on specific original datasets to extract user-selected Spectral summary parameters in different formats.
GNU General Public License v3.0