https://nbviewer.jupyter.org/github/iwasscience/UFO-Sightings-Analysis/blob/master/UFO-Project.ipynb
pandas, numpy, scikit-learn (logistic regression, k-neighbors-classifier, multinomial naive bayes), tfidf-vectorizer
In most UFO sightings, what people believe are UFOs are actually just common objects like planes and clouds, or celestial events like meteors and planets that seem unusually bright.
Some cases remain unidentified even after they’ve been investigated, but scientists believe many of these to also be sightings of more common objects that people simply didn’t recognize.
Many reported UFO sightings actually turn out to be something as simple as a balloon.
Predicting the type (e.g. circle, triangle, oval) and the country of a UFO sighting.
Exploratory data analysis & data preprocessing
- Exploring basic data statistics (e.g. df.head/info/describe, variance, distribution plots)
- Feature Engineering (e.g. handling time data, missing values)
- Tfidf-Vectorizing ufo-sighting descriptions
Classifying the country of a UFO sighting
- Comparing KNeighborsClassifier & LogisticRegression
Classifying the type of a UFO sighting
- fitting tfidf-vectors to multinomial naive bayes
Conclusion
The score to predict the country of the sighting (us or not us) is pretty good on both KNN and Logistic Regression!
Sadly, the model performance in predicting the type of the UFO-Sighting is really bad. Further investigating the text or finding other features that are useful in predicting the type would be the next step.