/ds-portfolio

Primary LanguageJupyter Notebook

ds-portfolio

Recommender system

Using the movies dataset, we build a recommender system:

  • based on videos popularity using IMDB weighted rating formula
  • based on the similarity in the overview of videos

Dataset Here

Results:

1-

title vote_count vote_average score
The Shawshank Redemption 8358.0 8.5 8.445869
The Godfather 6024.0 8.5 8.425439
Dilwale Dulhania Le Jayenge 661.0 9.1 8.421453
....................... ..... ..... .....

2-

get_recommendations('Father of the Bride Part II')
title
Father of the Bride
Kuffs 6024.0
North to Alaska
Wendigo
The Magic of Méliès
.......................

[x] We'll improve again the second part to take account others features like the director, actors, keywords and the movie genres.

Results

get_recommendations('Father of the Bride Part II', cosine_similarity)
title
Baby Boom
Father of the Bride
¡Three Amigos!
Hanging Up
Das merkwürdige Verhalten geschlechtsreifer Gr...
.......................

Time serie analysis

Making forecast with time serie. contains notes

  • Notes takes from Here

For model-comparison-sarima-lstm-prophet, we have :

Models MEAN RMSE Errors MSE Errors
SARIMA 148.42 8.14 66.18
LSTM 148.42 10.77 116.02
PROPHET 148.42 11.48 131.69
  • Then SARIMA fit well (with simple tuning) to the monthly beer production dataset.

Chosen algorithm

Given the type of model: classification or prediction, the chosen class will give several algorithms with their accuracy
That way we can save times for choosing the best algorithms :)

Use-case

From there

import pandas as pd
from sklearn.model_selection import train_test_split
from chosen.chosen import Chosen

data = pd.read_csv('data/Social_Network_Ads.csv')
X = data.iloc[:, -3:-1].values
y = data.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = Chosen(X_train, y_train, model_type='classification', scaling=True)
model.train()

We got

Use case Within the boxplot Use case