The dataset of this project contains Netflix movies.
In the first part of the project we observe the dataset and produce statistics about the content of the dataset. Some of the statistics are:
- Number of movies/series.
- Country with the most content.
- Year with the most content.
- The popularity of each genre for every country.
We implement a recommendation system in order to recomend similar movies to a given movie. In order to represent each movie we tried the two following representations:
In order to compute the similarity between the repsesentations we used:
PS: if the notebook cannot be opened on github, you can view it via the Jupiter nbviewer:
- Visit: https://nbviewer.org/
- Paste the link of the notebook (https://github.com/giannhskp/Data-Mining/blob/main/Project1.ipynb)
Given a dataset with news articles we should train a model that classifies each article as fake or True. We try different ways to represent the text of each article, such as:
- Bag Of Words
- TF-IDF
- Word2Vec
Also, we use different models in order to compare their performance. The models that we used are:
- Logistic Regression
- Naive Bayes
- Support Vector Machines (SVM)
- Random Forest
Finaly we compare the performance between every combination of representation/model.