Pinned Repositories
Breast-Cancer-detection
We have 30 different attributes from images extracted, Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. We predict the Stage of Breast Cancer B (Bengin) or M (malignant).
Compute-AUC-ROC-from-scratch-python
Computing AUC ROC from scratch in python without using any libraries
EDA-Scratch-MIT-dataset-
Exploratory data analysis on scratch MIT dataset using R shiny and Tableau
House-price-index-Advanced-regression
Advanced regression techniques for house prices dataset with use of ensemble techniques such as Stacking, averaging, Gradient Boosting, XGB, Light GBM, ElasticNet.
ImdbReviews
Web scrapped reviews from IMDB for Game of thrones and performed Sentiment Analysis Using NLP.
Malware-Detection
Malware detection using Machine learning
NLPClothingReview
I have worked on the analysis of reviews of an ecommerce clothing website where I have performed EDA and sentimental analysis. For sentiment analysis, I performed cleaning on it like removing the punctuation and the stop words from it, then tokenizing and like removing words which were not important like which have length less than 3. I performed analysis such as finding the most common words used in a review. (dress, size, love, like, top) Then made use of text blob to find the sentiment of the reviews and created a list of most commonly used words in positive review and a negative review. Then used a classification algorithm like naïve Bayes to train the model to rate to a review and tested it on the new data. Count vectorizer Results: 1) Reviews with 3 and 4-star rating had the longest reviews. 2) Users shopped for tops 60 percent more than bottoms 3) Got 85 percent accuracy in the naïve bayes model.
Recommendation-engine-Netflix
Designing a recommendation engine on Netflix movies data
Time-series-Honeywell-Stock-price-prediction
We make use of time series to predict the future values of the Honeywell stock. We perform exponential smoothing forecast on Honeywell stock prices with varying value of parameters to find the best fit. To find the best fit we make use of SSE and MSE and compare all the values. We also perform the prediction using linear regression analysis and compare the results with exponential smoothing forecast. We find the coefficient of correlation and determination. We learn more about the residuals and their shapes when used in scatterplots. We also find the actual Honeywell stock price and compare it with all our forecasts. More information updated in the word file.
Time-series-modeling-basics
Basics of Time series modeling in Python using pandas
akshaykapoor347's Repositories
akshaykapoor347/Compute-AUC-ROC-from-scratch-python
Computing AUC ROC from scratch in python without using any libraries
akshaykapoor347/Time-series-modeling-basics
Basics of Time series modeling in Python using pandas
akshaykapoor347/NLPClothingReview
I have worked on the analysis of reviews of an ecommerce clothing website where I have performed EDA and sentimental analysis. For sentiment analysis, I performed cleaning on it like removing the punctuation and the stop words from it, then tokenizing and like removing words which were not important like which have length less than 3. I performed analysis such as finding the most common words used in a review. (dress, size, love, like, top) Then made use of text blob to find the sentiment of the reviews and created a list of most commonly used words in positive review and a negative review. Then used a classification algorithm like naïve Bayes to train the model to rate to a review and tested it on the new data. Count vectorizer Results: 1) Reviews with 3 and 4-star rating had the longest reviews. 2) Users shopped for tops 60 percent more than bottoms 3) Got 85 percent accuracy in the naïve bayes model.
akshaykapoor347/Time-series-Honeywell-Stock-price-prediction
We make use of time series to predict the future values of the Honeywell stock. We perform exponential smoothing forecast on Honeywell stock prices with varying value of parameters to find the best fit. To find the best fit we make use of SSE and MSE and compare all the values. We also perform the prediction using linear regression analysis and compare the results with exponential smoothing forecast. We find the coefficient of correlation and determination. We learn more about the residuals and their shapes when used in scatterplots. We also find the actual Honeywell stock price and compare it with all our forecasts. More information updated in the word file.
akshaykapoor347/Malware-Detection
Malware detection using Machine learning
akshaykapoor347/13-reasons-why-suicide-Timeseries-analysis
We extract data from google trends based on the term suicide and terms related to it from 2017-01-15 to 2017-03-30. We then apply ARIMA model on it to the to forecast the values for the next 19 days. We then compare it with the actual values of the search results to see the accuracy of the forecast. We check for the stationarity of the series and then use acf and pacf to decide p,d and q values of our arima model. We also apply auto arima and compare our forecast results with the real results. We are basically recreating the analysis performed by the John W. Ayers to see whether there was significant increase in the search results related to the suicide query.
akshaykapoor347/BeautifulSoupScrapping
Use of Beautiful Soup to extract data from a web page tutorial from analytics vidhya
akshaykapoor347/Regression-using-PCA
Regression using PCA in python on wine dataset.
akshaykapoor347/Breast-Cancer-detection
We have 30 different attributes from images extracted, Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. We predict the Stage of Breast Cancer B (Bengin) or M (malignant).
akshaykapoor347/House-price-index-Advanced-regression
Advanced regression techniques for house prices dataset with use of ensemble techniques such as Stacking, averaging, Gradient Boosting, XGB, Light GBM, ElasticNet.
akshaykapoor347/ImdbReviews
Web scrapped reviews from IMDB for Game of thrones and performed Sentiment Analysis Using NLP.
akshaykapoor347/Recommendation-engine-Netflix
Designing a recommendation engine on Netflix movies data
akshaykapoor347/akshaykapoor347.github.io
Portfolio Website
akshaykapoor347/Building-a-Simple-Chatbot-in-Python-using-NLTK
Building a Simple Chatbot from Scratch in Python (using NLTK)
akshaykapoor347/Credit-card-fraud-detection
Predicting whether a transaction is fraudulent or not using machine learning.
akshaykapoor347/data-science-question-answer
A repo for data science related questions and answers
akshaykapoor347/House-prices-king-county
Regression analysis on House prices.
akshaykapoor347/Image-recognition-using-convolution-neural-network-
This project is about the implementation of image recognition using convolution neural networks. We make use of TensorFlow, Theano and keras deep learning libraries to build the convolution neural network and implement image recognition.
akshaykapoor347/KenoAnalysis
The analysis focuses on Caveman Keno game which is a variation of traditional Keno Game. It is a popular game which is played online or in gambling casinos extensively. This variation of Keno adds drawing of 3 numbers by the computer which become dinosaur eggs. If two or three of the eggs match among the 20 numbers drawn, the payouts are then multiplied. The analysis of this project is application of Hypergeometric distribution in a real-world game.
akshaykapoor347/LifeStory-Rshiny-application
Interactive R shiny application for analysis of LifeStory data on genetics and women health. Live Application link: https://akshaykapoor347.shinyapps.io/week_1/
akshaykapoor347/Loan-prediction
Predicting whether a load would be provided or not based on the credit history, income and other factors.
akshaykapoor347/markov_clustering
markov clustering in python
akshaykapoor347/pymc3
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano
akshaykapoor347/Quora-question-similarity-word2Vec
Implementation of Word2Vec to find similarities between words
akshaykapoor347/safe-water
We are trying to predict health-based drinking water violations in the United States.
akshaykapoor347/Sqlprac
akshaykapoor347/Titanic-Prediction
Solving the classification problem on this classic dataset
akshaykapoor347/Twitter-Word-cloud-using-R
Natural language processing, displaying word cloud based on recent tweets about a topic using R.
akshaykapoor347/WeatherApp
Learning Node js, basically an app to get weather values which makes api calls
akshaykapoor347/Yelp-reviews
EDA and sentimental analysis on Yelp dataset