Pinned Repositories
Comparison-of-Multi-item-capacitated-lot-sizing-heuristics-using-Python
The problem associated with the project was to minimize the combined inventory holding and setup costs over the planning horizon. Pandas, Numpy libraries used for comparison of Python code and research paper heuristic outcomes and made animation plots for visualization.
Handwritten-Digits-Classification-using-MNIST-dataset
Developed a 5-layer Sequential Convolutional Neural Network using Keras with Tensorflow backend for digit recognition trained on MNIST dataset. Adjusted parameters such as kernel size, activation function and optimizer properties to compute the best fit. Obtained an accuracy of 97.12%. Performed Data Augmentation such as image scaling, image flips and image rotation to avoid overfitting and increase the accuracy to 98.02%. Compared the accuracy to KNN, Logistic Regression, Random Forest which had accuracies of 96.37%, 91.22% and 96.19% respectively.
Lending-Club-Loan-Data-Analysis
Lending club is US lending company and the data set consists of 1 million observations and 74 variables for loans issued over the span of 8 years. Performed exploratory data analysis and data imputation and cleaning to understand the data and provide a concise input for building the models. Explored machine learning algorithms such as Logistic Regression, LDA, QDA, Classification tree, Random Forest, SVM, xgBoost to predict the loan status as default or fully paid. Compared the performance of these algorithms through metrics such as accuracy, error, ROC curve and confusion matrix and identified Random forest as the best model. Identified important parameters as the factors to suggest before loan allocation.
Traffic-Violation-Data-Analysis-using-Python
Created a python code for exploratory data analysis using Numpy, Pandas modules to extract live data for the traffic violations in Montgomery.
101-pandas
Jupyter notebooks of the Pandas exercises found on machinelearningplus.com
Exploratory-Data-Analysis-of-Legally-Operating-Business-in-NYC
Setup a Cloudera Hadoop ecosystem, created tables in HDFS and ran hive queries over partitioned tables to perform exploratory data analysis.
Netflix-recommendation
Analyzed the Netflix data to implement recommendation systems to users using collaborative filtering considering the Pearson’s’ R correlation.
Predicting-Duplicate-Question-Pairs-from-Quora
Performed Exploratory Text Analysis to understand the data and created a function to identify shared words between question pairs using nltk corpus. Assigned weights to words considering common occurrence to improve the function. Rebalanced the data and used xgboost to get an accuracy of 0.65.
Predicting-User-Sentiment-for-Fine-Foods-products-using-Text-Summary-of-Amazon-Reviews
Accumulated data for analysis using SQL based queries from SQLite dataset using the sqlite3 package in Python. Partitioned the review into positive and negative sentiments and cleaned the text data by stemming, tokenizing, pruning using the nltk library. Applied Logistic Regression and Naïve Bayes classifiers to obtain an accuracy of 92.53% and 90.74% respectively.
Python-programming-exercises
100+ Python challenging programming exercises
patankaraditya1's Repositories
patankaraditya1/Comparison-of-Multi-item-capacitated-lot-sizing-heuristics-using-Python
The problem associated with the project was to minimize the combined inventory holding and setup costs over the planning horizon. Pandas, Numpy libraries used for comparison of Python code and research paper heuristic outcomes and made animation plots for visualization.
patankaraditya1/Handwritten-Digits-Classification-using-MNIST-dataset
Developed a 5-layer Sequential Convolutional Neural Network using Keras with Tensorflow backend for digit recognition trained on MNIST dataset. Adjusted parameters such as kernel size, activation function and optimizer properties to compute the best fit. Obtained an accuracy of 97.12%. Performed Data Augmentation such as image scaling, image flips and image rotation to avoid overfitting and increase the accuracy to 98.02%. Compared the accuracy to KNN, Logistic Regression, Random Forest which had accuracies of 96.37%, 91.22% and 96.19% respectively.
patankaraditya1/Lending-Club-Loan-Data-Analysis
Lending club is US lending company and the data set consists of 1 million observations and 74 variables for loans issued over the span of 8 years. Performed exploratory data analysis and data imputation and cleaning to understand the data and provide a concise input for building the models. Explored machine learning algorithms such as Logistic Regression, LDA, QDA, Classification tree, Random Forest, SVM, xgBoost to predict the loan status as default or fully paid. Compared the performance of these algorithms through metrics such as accuracy, error, ROC curve and confusion matrix and identified Random forest as the best model. Identified important parameters as the factors to suggest before loan allocation.
patankaraditya1/Traffic-Violation-Data-Analysis-using-Python
Created a python code for exploratory data analysis using Numpy, Pandas modules to extract live data for the traffic violations in Montgomery.
patankaraditya1/101-pandas
Jupyter notebooks of the Pandas exercises found on machinelearningplus.com
patankaraditya1/Exploratory-Data-Analysis-of-Legally-Operating-Business-in-NYC
Setup a Cloudera Hadoop ecosystem, created tables in HDFS and ran hive queries over partitioned tables to perform exploratory data analysis.
patankaraditya1/Netflix-recommendation
Analyzed the Netflix data to implement recommendation systems to users using collaborative filtering considering the Pearson’s’ R correlation.
patankaraditya1/Predicting-Duplicate-Question-Pairs-from-Quora
Performed Exploratory Text Analysis to understand the data and created a function to identify shared words between question pairs using nltk corpus. Assigned weights to words considering common occurrence to improve the function. Rebalanced the data and used xgboost to get an accuracy of 0.65.
patankaraditya1/Predicting-User-Sentiment-for-Fine-Foods-products-using-Text-Summary-of-Amazon-Reviews
Accumulated data for analysis using SQL based queries from SQLite dataset using the sqlite3 package in Python. Partitioned the review into positive and negative sentiments and cleaned the text data by stemming, tokenizing, pruning using the nltk library. Applied Logistic Regression and Naïve Bayes classifiers to obtain an accuracy of 92.53% and 90.74% respectively.
patankaraditya1/Python-programming-exercises
100+ Python challenging programming exercises
patankaraditya1/Simulated-Annealing-to-solve-the-Travelling-Salesman-Problem
patankaraditya1/Twitter-Sentiment-Analysis
Used Tweepy library in Python to extract tweets through twitter API based on the keyword set by the user. Cleaned tweets to remove special characters, links and hashtags. Used textblob library to analyze the sentiments for the term based on the tweets.