Over a period of nine years in deep space, the NASA Kepler space telescope has been out on a planet-hunting mission to discover hidden planets outside of our solar system. To help process this data,a machine learning models were created to classifying candidate exoplanets from the raw dataset.
- Preprocessing was conducted to the dataset prior to fitting the models.
- Feature selection and remove unnecessary features was conducted fro all models.
- I used
MinMaxScaler
to scale the numerical data. - I Separate the data into training and testing data.
- I used
GridSearch
to tune model parameters. - I tuned and compare the reported classifiers.
In this project I used five machine learning models in order to tarin, test and classifying candidate exoplanets from the raw dataset. In the reporting section summary about the findings, assumptions and comparison of models is executed.
BeforeCV | AfterCV | |
---|---|---|
Training Score | 0.749 | 0.872 |
Testing Score | 0.757 | 0.864 |
BeforeCV | AfterCV | |
---|---|---|
Training Score | 1.0 | 1.0 |
Testing Score | 0.897 | 0.899 |
BeforeCV | AfterCV | |
---|---|---|
Training Score | 0.845 | 0.886 |
Testing Score | 0.841 | 0.879 |
BeforeCV | AfterCV | |
---|---|---|
Training Score | 0.675 | 1.0 |
Testing Score | 0.636 | 0.842 |
- Normal Neural Network - Loss: 0.2826294135174435, Accuracy: 0.8787185549736023
- Deep Neural Network - Loss: 0.2919023224500006, Accuracy: 0.8655606508255005
The logistic regression training and test score significantly increases BeforeCV and AfterCV but comparing the other model's the value was lower. The f1-score of FALSE POSITIVE for the logistic regression model is 0.89 meaning, it can predict FALSE POSITIVE well, and it's reliable, but comparing random forest (0.98) and K-Nearest Neighbors(0.98) it is lower. Random Forest model's best score of (0.89) seems better than the SVM model (0.87) when comparing the scores. The Normal Neural Network accuracy(0.87) is better than Deep Neural Network(0.86).
In general, from the executed machine learning models on the given exoplanets dataset, I found that the random forest model is better to predict the data. It was a good experience to know which machine learning model does what and comparing the training, testing, accuracy, recall, precision results of the models.