The datasets for all these codes are available in the folder.
- Gapminder
- predicts fertility
- linear model - linear regression
- tree (overfitting) - decision tree regressor
- Titanc
- predicts survivability
- linear model - logistic regression
- Heart
- predicts chance of heart disease
- neighbors - KNeighborsClassifier
- tree (overfitting) - decision tree classifier
- House Voters 84
- predicts class name
- ensemle - random forest classifier
- Tweets
- predicts author
- feature extraction - CountVectorizer
- naive_bayes - MultinomialNB
- Country Data
- predicts mortality
- cluster - K means
Other features used:
- one hot encoding
- dummy encoding
- tf_idf vectorizer
- count vectorizer
Issues in Datasets :
- Not all data are numeric
- Data has null values
These codes have used both supervised and unsupervised learning.