mrc03
Data Scientist | Kaggle Master | Published Author | Topmate: https://topmate.io/raj_mehrotra
UnitedHealth GroupHyderabad, Telangana
Pinned Repositories
Amazon-Fine-Food-Reviews-Analysis
The famous Amazon fine food reviews dataset on Kaggle for text classification. I have performed sentiment analysis on the dataset using different techniques. Please see readme for details.
Appdichat
Appdichat is mobile chatting application with many features. You can register and login with the application. Also you can send friend request to your friends and accept or decline the requests received. You can also chat in real time with your friends and also see a list of the people using the application. A user can also build his profile by setting his display profile picture and set the status. Also you can view the profile of all your friends and know the number of mutual friends. The application uses Java, XML and the Firebase realtime database.
Cats-vs-Dogs-CNN-Keras
The famous Cats-vs-Dogs dataset. I have used a self laid ConvNet to classify the image into 2 classes either a Dog or a Cat. The images used are of 100*100 pixels each. The images are first converted to the numpy array of pixel values using the python ZipFile module. The images are then divided into the training ,cross-validation,testing set containing 20000 , 5000 , 12500 images respectively. Also I have used data augmentation technique to avoid chances of overfitting the model. Finally I achieved a decent accuracy of about 88 % on the validation set.
CPUSchedulerApp
A small Android application for the Operating System Lab Project. The application implements a short term scheduler. The user enters the details of the various processes and then chooses one among the many options for the CPU scheduling algorithm. The application then displays the sequence in which the processes will be executed and also various other quantities like waiting time,turn around time for a process etc..
Flower-Recognition-Kaggle-CNN-Keras
The dataset is Flower Recognition on Kaggle. The dataset consists of 4232 images each of different pixel values. Each of the image can be classified into either of 5 types-> 'Daisy','Rose' etc... . I have trained Convolutional Neural Network written in Keras to predict the flower on the validation set. Also used ImageDataGenerator to augment the training set and avoid overfitting problem .
Housing-Prices-EDA-and-Regression-Models
The famous Housing Price Advanced Regression competition on Kaggle. The dataset contains of training and testing sets each with about 1.46K rows and 81 features pertaining to a house. I have first performed an exhaustive EDA to identify the underlying trends in the data. I have also removed outliers to make the regression models more robust. Also proper missing values treatment has been done with imputation being done wherever needed. Lastly I have deployed various regression models like Lasso,Ridge etc... from scikit and have also tuned their parameters from the GridSearchCV module. Finally achieved a RMSE of little more than 0.12 which is pretty decent.
IBM-HR-Analytics-Employee-Attrition-Performance
The IBM HR Analytics Employee Attrition & Performance dataset from the Kaggle. I have first performed Exploratory Data Analysis on the data using various libraries like pandas,seaborn,matplotlib etc.. Then I have plotted used feature selection techniques like RFE to select the features. The data is then oversampled using the SMOTE technique in order to deal with the imbalanced classes. Also the data is then scaled for better performance. Lastly I have trained many ML models from the scikit-learn library for predictive modelling and compared the performance using Precision, Recall and other metrics.
Movie-Reviews-NLTK-Sentiment-Analysis-
The Movie Reviews dataset. The dataset is imported from the NLTK libray. It has 1000 positive and 1000 negative reviews. I have first imported the dataset into a pandas data frame which makes it easier to do the processing. The next step is to analyze the (+) and ( - ) reviews. I have also preprocessed the dataset using Lemmatizing and other standard NLP techniques. To extract the features from the text I have used the Tfidf vectorizer from the scikit. Lastly I have used various modellig algos from scikit to train on this data.
Red-Wine-Quality-Accuracy-0.9175-
The Red Wine Quality dataset from kaggle. Data is provided of the composition of the wine having different chemicals. I have used pandas to manipulate the data and seaborn to visualize the data. Finally I have made predictions on the wine quality by using various models from the scikit-learn.
SAD_PROJECT
A blood bank mobile application where the user can register and login. A blood donor can register with the application and earn points. The receiver can search for donors and either call donor or locate him on the Google Maps. The application uses Java , XML and the Firebase API as backend and Google Maps API to locate the donor on the Google Maps.
mrc03's Repositories
mrc03/IBM-HR-Analytics-Employee-Attrition-Performance
The IBM HR Analytics Employee Attrition & Performance dataset from the Kaggle. I have first performed Exploratory Data Analysis on the data using various libraries like pandas,seaborn,matplotlib etc.. Then I have plotted used feature selection techniques like RFE to select the features. The data is then oversampled using the SMOTE technique in order to deal with the imbalanced classes. Also the data is then scaled for better performance. Lastly I have trained many ML models from the scikit-learn library for predictive modelling and compared the performance using Precision, Recall and other metrics.
mrc03/Flower-Recognition-Kaggle-CNN-Keras
The dataset is Flower Recognition on Kaggle. The dataset consists of 4232 images each of different pixel values. Each of the image can be classified into either of 5 types-> 'Daisy','Rose' etc... . I have trained Convolutional Neural Network written in Keras to predict the flower on the validation set. Also used ImageDataGenerator to augment the training set and avoid overfitting problem .
mrc03/Spooky-Author-Identification
The notebook on famous Kaggle competition : Spooky Author Identification. The task is to identify the authors from their respective texts or work. I have first cleaned and pre-processed the text using standard NLP techniques like tokenization , stemming or lemmatization , stop-word removal etc.... I have also tried to create some meta features or hand-crafted features based on the author writing pattern. Then I have used the traditional BOW approach with TFIDF Vectorizer and the Count Vectorizer and then deployed ML algos like LogisticRegression and Naive Bayes which are well suited for text data. For me tfidf on count vectorizer gave best results till now ; My submission scored a multi-class log loss of 0.46 on kaggle private LB which is quite decent.
mrc03/Topic-Modelling-using-LDA-and-LSA-in-Sklearn
I have performed topic modelling on the dataset : "A Million News Headlines' on the kaggle. I have first pre-processed and cleaned the data. Then I have used the implementations of the LDA and the LSA in the sklearn library. Also the distribution of words in a topic is shown.
mrc03/Word-Embeddings-in-Gensim-and-Keras
A simple implementation of word embeddings in Gensim and Keras libraries. I have implemented famous Word2Vec in Gensim library. As an alternative I have also used Keras embedding layer to generate the word embeddings.
mrc03/Amazon-Fine-Food-Reviews-Analysis
The famous Amazon fine food reviews dataset on Kaggle for text classification. I have performed sentiment analysis on the dataset using different techniques. Please see readme for details.
mrc03/Appdichat
Appdichat is mobile chatting application with many features. You can register and login with the application. Also you can send friend request to your friends and accept or decline the requests received. You can also chat in real time with your friends and also see a list of the people using the application. A user can also build his profile by setting his display profile picture and set the status. Also you can view the profile of all your friends and know the number of mutual friends. The application uses Java, XML and the Firebase realtime database.
mrc03/awesome-embedding-models
A curated list of awesome embedding models tutorials, projects and communities.
mrc03/awesome-project-ideas
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
mrc03/Basic-Guide-to-Natural-Language-Processing-with-NLTK-and-Spacy
A basic guide to implement fundamental NLP techniques like text normalization, text similarity etc... through NLTK library and Spacy.
mrc03/Project
The Project is an Android application that displays the level of various gases in the atmosphere. The volume of gases in the atmosphere is stored in an Excel file. The data values stored in an Excel file is updated periodcally with data fetched from the sensors.The application reads the contents of the file and displays the results fetched in the application.
mrc03/awesome-text-summarization
The guide to tackle with the Text Summarization
mrc03/data-science-complete-tutorial
Notebooks to learn data science - Videos https://www.edyoda.com/course/1416
mrc03/Word-Level-Eng-Mar-NMT
Translating English sentences to Marathi using Neural Machine Translation
mrc03/awesome-nlp
:book: A curated list of resources dedicated to Natural Language Processing (NLP)
mrc03/EEG-Emotion-classification
mrc03/Internship-Assignment
mrc03/leetcode_company_wise_questions
This is a repository containing the list of company wise questions available on leetcode premium
mrc03/really-awesome-gan
A list of papers on Generative Adversarial (Neural) Networks
mrc03/Seizure-Detection-Tutorials
A series of tutorials teaching the use of Python for epileptic seizure detection on open-source datasets
mrc03/UltimateAndroidReference
:rocket: Ultimate Android Reference - Your Road to Become a Better Android Developer
mrc03/awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
mrc03/CNNs-on-CHB-MIT
The project is about applying CNNs to EEG data from CHB-MIT to predict seizure
mrc03/coding-interview-university
A complete computer science study plan to become a software engineer.
mrc03/conversational-datasets
Large datasets for conversational AI
mrc03/darknet
Convolutional Neural Networks
mrc03/dialogflow-android-client
Android SDK for Dialogflow
mrc03/Interview-Prepartion-Data-Science
mrc03/mne-python
MNE : Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
mrc03/neurodsp
Digital signal processing for neural time series.