For more than nine years in deep space, the NASA Kepler space telescope has been out on a planet-hunting mission to discover hidden planets outside of our solar system. To help process this data, I created 2 machine learning models capable of classifying candidate exoplanets from the raw dataset. I followed the following steps:
* Preprocess the raw data
* Tune the models
* Compare two or more models
- https://www.kaggle.com/nasa/kepler-exoplanet-search-results
- https://exoplanetarchive.ipac.caltech.edu/docs/API_kepcandidate_columns.html
Current results of the Logistic Regression and KNN models have scores of 0.61 and 0.63, respectively. The KNN model is slightly more accurate than LR.
PAC table show that the data set correlation coefficients among variables vary considerably. In some cases, coefficients are very low or non existant.
Some PairPlot displays are very sparse. The less sparse plots were selected.
A more thorough review of the available dataset may through better model scores. Even more, other statistical approaches such as Random Forest or SVM, may produce more accurate results.