Atharva1309
UT Dallas graduate student. Business Analytics, Data Analytics, SQL, R, Python, Tableau and ML on my mind.
Dallas, Texas
Pinned Repositories
Amazon-Shipping-Analytics
Amazon Shipping is a company which ships a variety of FMCG (Fast Moving Consumer Goods) all over the world. The Shipping Manager doesn't have much visibility and doesn't know how many orders are processed and shipped on a monthly basis. He would like a dashboard of this where he selects a month and can tell how many orders are outstanding a day and where they should be shipped
Atharva1309.github.io
My Personal Website
Bank-Marketing-Analysis
This dataset is based on "Bank Marketing" UCI dataset (please check the description at: http://archive.ics.uci.edu/ml/datasets/Bank+Marketing). The data is enriched by the addition of five new social and economic features/attributes (national wide indicators from a ~10M population country), published by the Banco de Portugal and publicly available at: https://www.bportugal.pt/estatisticasweb. For the dataset check the classification file where the link is mentioned to the dataset. Performed data cleaning and imputation of 10% of the data using Numpy & Pandas packages on Jupyter Notebook. Classified whether a client agreed to place deposit or not with a test accuracy of 91%, to improve the marketing campaign. Implemented classification models like KNN, Logistic Regression, Decision Tree and Support Vector Machines
bitcoin-price-prediction
In this project RNN variations are tested on a dataset comprised of not only Bitcoin historical price, but also other influencing factors such as macroeconomic indices, sentiments etc.
Bitcoin_Price_Prediction_Using_NeuralProphet
Car-Sale-Prediction
This is a part of my college project related to typical machine learning concepts of regression. The data was scraped from several websites in Czech Republic and Germany over a period of more than a year. To access the dataset please click on this - https://www.kaggle.com/mirosval/personal-cars-classifieds. I performed data pre-processing of roughly 3.5 million rows and exploratory data analysis to check the distribution of data in each column. Later, I implemented various regression algorithms to find out how the factors affect the price of the used car and the resale value. Finally, determined the optimal algorithm suited for this data which has the highest R^2 value as deciding parameter.
Instacart-Market-Basket-Analysis
This is a repository for Instacart Market Basket Analysis. It consists of data cleaning, exploratory data analysis,data visualization and machine learning algorithms developed for the project. Market Basket Analysis is a modeling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. In this analysis, a forecasting model is developed using machine learning algorithms to improve the accurate forecasts of product sales.
neural_prophet
NeuralProphet - a Neural Network based Time-Series model
OCR_template
The two python files are specific for the pdf file I have uploaded, the 1st file dskew.py I have converted the image into grayscale and checked the orientation of the image which we get after converting the pdf file into image. The second python file segment.py is used to manipulate the image. I have created a seperate co-ordinate variable for each field I wanted to extract from the image. Then these cropped images are converted into text using Pytesseract and appending each extracted field into a dictionary. At the end, I have dumped this dictionary and its key-value pairs in a json variable.
Stock-Market-Sentiment-Analysis
This is a dataset of news headlines of publicly held organizations. The headlines are preprocessed and converted into TF-IDF numeric vectors using NLTK. TF-IDF is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. TF-IDF (term frequency-inverse document frequency) was invented for document search and information retrieval. It works by increasing proportionally to the number of times a word appears in a document, but is offset by the number of documents that contain the word. So, words that are common in every document, such as this, what, and if, rank low even though they may appear many times, since they don’t mean much to that document in particular. Implemented bag of words approach for vectorization and predicted the sentiment of a stock market of an organization using a RandomForest classifier with an accuracy of 82% to help customers to invest in ideal stock to gain high ROI
Atharva1309's Repositories
Atharva1309/Bitcoin_Price_Prediction_Using_NeuralProphet
Atharva1309/Stock-Market-Sentiment-Analysis
This is a dataset of news headlines of publicly held organizations. The headlines are preprocessed and converted into TF-IDF numeric vectors using NLTK. TF-IDF is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. TF-IDF (term frequency-inverse document frequency) was invented for document search and information retrieval. It works by increasing proportionally to the number of times a word appears in a document, but is offset by the number of documents that contain the word. So, words that are common in every document, such as this, what, and if, rank low even though they may appear many times, since they don’t mean much to that document in particular. Implemented bag of words approach for vectorization and predicted the sentiment of a stock market of an organization using a RandomForest classifier with an accuracy of 82% to help customers to invest in ideal stock to gain high ROI
Atharva1309/Amazon-Shipping-Analytics
Amazon Shipping is a company which ships a variety of FMCG (Fast Moving Consumer Goods) all over the world. The Shipping Manager doesn't have much visibility and doesn't know how many orders are processed and shipped on a monthly basis. He would like a dashboard of this where he selects a month and can tell how many orders are outstanding a day and where they should be shipped
Atharva1309/Atharva1309.github.io
My Personal Website
Atharva1309/Bank-Marketing-Analysis
This dataset is based on "Bank Marketing" UCI dataset (please check the description at: http://archive.ics.uci.edu/ml/datasets/Bank+Marketing). The data is enriched by the addition of five new social and economic features/attributes (national wide indicators from a ~10M population country), published by the Banco de Portugal and publicly available at: https://www.bportugal.pt/estatisticasweb. For the dataset check the classification file where the link is mentioned to the dataset. Performed data cleaning and imputation of 10% of the data using Numpy & Pandas packages on Jupyter Notebook. Classified whether a client agreed to place deposit or not with a test accuracy of 91%, to improve the marketing campaign. Implemented classification models like KNN, Logistic Regression, Decision Tree and Support Vector Machines
Atharva1309/bitcoin-price-prediction
In this project RNN variations are tested on a dataset comprised of not only Bitcoin historical price, but also other influencing factors such as macroeconomic indices, sentiments etc.
Atharva1309/Car-Sale-Prediction
This is a part of my college project related to typical machine learning concepts of regression. The data was scraped from several websites in Czech Republic and Germany over a period of more than a year. To access the dataset please click on this - https://www.kaggle.com/mirosval/personal-cars-classifieds. I performed data pre-processing of roughly 3.5 million rows and exploratory data analysis to check the distribution of data in each column. Later, I implemented various regression algorithms to find out how the factors affect the price of the used car and the resale value. Finally, determined the optimal algorithm suited for this data which has the highest R^2 value as deciding parameter.
Atharva1309/Instacart-Market-Basket-Analysis
This is a repository for Instacart Market Basket Analysis. It consists of data cleaning, exploratory data analysis,data visualization and machine learning algorithms developed for the project. Market Basket Analysis is a modeling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. In this analysis, a forecasting model is developed using machine learning algorithms to improve the accurate forecasts of product sales.
Atharva1309/neural_prophet
NeuralProphet - a Neural Network based Time-Series model
Atharva1309/OCR_template
The two python files are specific for the pdf file I have uploaded, the 1st file dskew.py I have converted the image into grayscale and checked the orientation of the image which we get after converting the pdf file into image. The second python file segment.py is used to manipulate the image. I have created a seperate co-ordinate variable for each field I wanted to extract from the image. Then these cropped images are converted into text using Pytesseract and appending each extracted field into a dictionary. At the end, I have dumped this dictionary and its key-value pairs in a json variable.
Atharva1309/CS229_ML
🍟 Stanford CS229: Machine Learning
Atharva1309/Prosper-Loan-Data-Complete-analysis
Prosper is a peer-to-peer lending platform that aims to connect people who need money with those people who have the money to invest. In this data analysis project, I have explored the Prosper dataset and used Tableau to create my visualizations.
Atharva1309/Time-Series-Analysis
Time Series Analysis Concepts Explained with examples