This repository is intended to contain all the projects completed by me, Diptorshi Tripathi (BATCH A39), during my time as an Intern at CodSoft.
Task 1- Movie Genre Classification - A Kaggle dataset is provided, which has been analysed using Naive Bayes and Logistic Regression- in order to create two separate models that will analyse a sentence and depict what Genre of movie it sounds like. The dataset has been visualized using a simple Barplot and WordCloud, after which the models have been trained and tested with a created function which predicts the genre of an entered sample movie description.
Kaggle Link - https://www.kaggle.com/datasets/hijest/genre-classification-dataset-imdb
Task 2- Credit Card Fraud Detection - Two large datasets- fraudTest and fraudTrain can be found from the website, but are too large to upload. They have been analysed using Naive Bayes and Logistic Regression, giving accuracies of 91.8% and 99.4% respectively, and visualized using different plotting techniques, including a confusion matrix. At the end, it also provides a 0/1 output when provided with credit card details.
Kaggle Link - https://www.kaggle.com/datasets/kartik2112/fraud-detection
Task 3- Spam SMS Detection - In this program, a dataset of 5574 rows containing two columns- one saying if the messages are spam or not, and the other containing the mesaages themselves, is fed into the code. This dataset has been visualized using a simple barplot to show how many spam messages are there in total, and further visualized using WordCloud. The dataset itself has been trained using Naive Bayes, providing a 97.8% accuracy and correctly predicting if messages put in are spam or not.
Kaggle Link - https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset
Task 4- Customer Churn Prediction- Here, a Kaggle dataset with 10000 rows of data contains several different columns containing details about customers- Geographical location, Credit score, etc. and predetermined information saying if the customer has churned or not. This information has been visualized using the humble Barplot, Histograms and a Correlation Matrix via Heatmap. This has also been trained using Logistic Regression, returning an accuracy of 81.5%, and a function has been written to determine if provided information of a customer could determine if the customer will churn or not.
Kaggle Link - https://www.kaggle.com/datasets/shantanudhakadd/bank-customer-churn-prediction