Exoplanet Exploration

Machine Learning Challenge

Background & Scope

For more than nine years in deep space, the NASA Kepler space telescope has been out on a planet-hunting mission to discover hidden planets outside of our solar system. To help process this data, I created 2 machine learning models capable of classifying candidate exoplanets from the raw dataset. I followed the following steps:

* Preprocess the raw data
* Tune the models
* Compare two or more models

Data sources

Screenshots from data pre-processing and tuning

Principal Components Analysis

Pairplot Displays

Model 1: Logistic Regression

Classification Report

Model Results: KOI Disposition Actual vs. Predicted

Best scores for the current model

Model 2: K Nearest Neighbors

Classification Report

KNN Plot

Best scores for the current model

Models Comparison and Findings

Current results of the Logistic Regression and KNN models have scores of 0.61 and 0.63, respectively. The KNN model is slightly more accurate than LR.

PAC table show that the data set correlation coefficients among variables vary considerably. In some cases, coefficients are very low or non existant.

Some PairPlot displays are very sparse. The less sparse plots were selected.

A more thorough review of the available dataset may through better model scores. Even more, other statistical approaches such as Random Forest or SVM, may produce more accurate results.

maribsoto/Exoplanet-Exploration_Machine-Learning-Challenge

Exoplanet Exploration

Machine Learning Challenge

Background & Scope

Data sources

Screenshots from data pre-processing and tuning

Principal Components Analysis

Pairplot Displays

Model 1: Logistic Regression

Classification Report

Model Results: KOI Disposition Actual vs. Predicted

Best scores for the current model

Model 2: K Nearest Neighbors

Classification Report

KNN Plot

Best scores for the current model

Models Comparison and Findings