
Primary LanguageJupyter Notebook

CodSoft Project Repository

This repository is intended to contain all the projects completed by me, Diptorshi Tripathi (BATCH A39), during my time as an Intern at CodSoft.

The tasks mentioned are as follows-

Task 1- Movie Genre Classification - A Kaggle dataset is provided, which has been analysed using Naive Bayes and Logistic Regression- in order to create two separate models that will analyse a sentence and depict what Genre of movie it sounds like. The dataset has been visualized using a simple Barplot and WordCloud, after which the models have been trained and tested with a created function which predicts the genre of an entered sample movie description.

Kaggle Link - https://www.kaggle.com/datasets/hijest/genre-classification-dataset-imdb

Task 2- Credit Card Fraud Detection - Two large datasets- fraudTest and fraudTrain can be found from the website, but are too large to upload. They have been analysed using Naive Bayes and Logistic Regression, giving accuracies of 91.8% and 99.4% respectively, and visualized using different plotting techniques, including a confusion matrix. At the end, it also provides a 0/1 output when provided with credit card details.

Kaggle Link - https://www.kaggle.com/datasets/kartik2112/fraud-detection

Task 3- Spam SMS Detection - In this program, a dataset of 5574 rows containing two columns- one saying if the messages are spam or not, and the other containing the mesaages themselves, is fed into the code. This dataset has been visualized using a simple barplot to show how many spam messages are there in total, and further visualized using WordCloud. The dataset itself has been trained using Naive Bayes, providing a 97.8% accuracy and correctly predicting if messages put in are spam or not.

Kaggle Link - https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset

Task 4- Customer Churn Prediction- Here, a Kaggle dataset with 10000 rows of data contains several different columns containing details about customers- Geographical location, Credit score, etc. and predetermined information saying if the customer has churned or not. This information has been visualized using the humble Barplot, Histograms and a Correlation Matrix via Heatmap. This has also been trained using Logistic Regression, returning an accuracy of 81.5%, and a function has been written to determine if provided information of a customer could determine if the customer will churn or not.

Kaggle Link - https://www.kaggle.com/datasets/shantanudhakadd/bank-customer-churn-prediction