tanaymukherjee
Data Science Enthusiast. Digital Marketing Expert. Past exp in Analytics, Research & Strategy. Academics: Comp. Science Engg & MS in Statistics and Data Science
IBM | Ogilvy | Maersk | CUNY | TeslaNew York
Pinned Repositories
Case-Study-Predicting-Bankruptcy
Based on available data from bank and parameters to identify the variables that influence the most, predict the bankruptcy of the given financial model
Complex-SQL-Exercise
SQL queries of all kind being put together as a single repository
Deep-Learning-with-PyTorch
PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab. It is free and open-source software released under the Modified BSD license.
Dimensionality-Reduction
In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. Approaches can be divided into feature selection and feature extraction.
Dissecting-Yelp-Dataset
This dataset is a subset of Yelp's businesses, reviews, and user data. It was originally put together for the Yelp Dataset Challenge which is a chance for students to conduct research or analysis on Yelp's data and share their discoveries. In the dataset you'll find information about businesses across 11 metropolitan areas in four countries.
Exploring-SQL-with-R
The idea is to use the SQL skills in R by converting data into relational database from text files and then using it to run queries to filter data by SQL
Google-Analytics-with-R
How to automate reporting suite from GA to R, so that one can pull data at will without even interacting with Google Analytics interface. There are various things one can do and we will cover each one of them.
Investigating-NYC-Parking-Violations
For this project, we will analyze millions of NYC Parking violations since January 2016
Natural-Language-Processing
Natural language processing is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human languages, in particular how to program computers to process and analyze large amounts of natural language data.
Time-Series-Modeling
A time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data.
tanaymukherjee's Repositories
tanaymukherjee/Dissecting-Yelp-Dataset
This dataset is a subset of Yelp's businesses, reviews, and user data. It was originally put together for the Yelp Dataset Challenge which is a chance for students to conduct research or analysis on Yelp's data and share their discoveries. In the dataset you'll find information about businesses across 11 metropolitan areas in four countries.
tanaymukherjee/A-B-Testing-in-R
A/B testing (or split-testing) is a randomized experiment with two variants A and B. It includes application of statistical hypothesis testing (or two-sample hypothesis testing), as used in the field of statistics. A/B testing is a way to compare two versions of a single variable, typically by testing a subject's response to variant A against variant B, and determining which of the two variants is more effective.
tanaymukherjee/Deep-Learning-with-PyTorch
PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab. It is free and open-source software released under the Modified BSD license.
tanaymukherjee/Investigating-NYC-Parking-Violations
For this project, we will analyze millions of NYC Parking violations since January 2016
tanaymukherjee/Shapley-Value
tanaymukherjee/Spoken-Language-Processing-in-Python
tanaymukherjee/Natural-Language-Processing
Natural language processing is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human languages, in particular how to program computers to process and analyze large amounts of natural language data.
tanaymukherjee/CIS_9440_Project_YouTube-and-Netflix-Viewership-Analysis
This is a repository to put together all the work for the final project from CIS 9440 - Data Warehousing and Analytics
tanaymukherjee/Data-Science-Hacks-in-Python-Part-2
Simple hacks to speed up your Data Analysis
tanaymukherjee/Debugging-NY-Times-library
This is a web scrapping project and I am trying to gather info from NY Times using APIs
tanaymukherjee/Epileptic-Seizure-Recognition
tanaymukherjee/Flow-in-R
tanaymukherjee/HackerRank-Challenges
tanaymukherjee/Humana-Mays-Healthcare-Analytics-Case-Competition-2020
Mays Business School in partnership with Humana presents the fourth annual Humana-Mays Healthcare Analytics Case Competition. The competition will be held virtually and offers an opportunity for U.S. masters students to showcase their analytical skills and solve a real-world business problems for Humana utilizing real data.
tanaymukherjee/Kaggle-Competition-Santander-Customer-Transaction-Prediction
https://www.kaggle.com/c/santander-customer-transaction-prediction
tanaymukherjee/Learning-Kafka
tanaymukherjee/Linear-Regression-in-SQL
In this exercise we will try to learn how can we implement linear regression just using SQL.
tanaymukherjee/Machine-Learning-Fall-2020
This repo includes all the work/assignments I did as part of my coursework in Fall 2020 under the subject code STA 9891 with Prof. Rad.
tanaymukherjee/ML-in-Bioinformatics
Bioinformatics is a subdiscipline of biology and computer science concerned with the acquisition, storage, analysis, and dissemination of biological data, most often DNA and amino acid sequences.
tanaymukherjee/Network-Analysis
The promise of network analysis is the placement of significance on the relationships between actors, rather than seeing actors as isolated entities. The emphasis on complexity, along with the creation of a variety of algorithms to measure various aspects of networks, makes network analysis a central tool for digital humanities.
tanaymukherjee/NLP-Class-Fall-2020
tanaymukherjee/No-SQL-in-Python
tanaymukherjee/OOP-in-Python
Demystifying the world of object oriented programming in Python
tanaymukherjee/PB_Challenge_2021
In this exercise we are trying to predict that for given information can we predict whether a device will fail in next 7 days.
tanaymukherjee/Real-and-Fake-News-Analysis
tanaymukherjee/SQL-Exercise-2
In this exercise we will try to answer a specific data requirement.
tanaymukherjee/Tableau-Dashboards
This repository is a showcase of all the tableau dashboards I have built so far.
tanaymukherjee/tanaymukherjee
tanaymukherjee/Useful-Python-libraries-for-Data-Science
In this repository, I am trying to compile some useful Python libraries for data science tasks other than the commonly used ones like pandas, scikit-learn, matplotlib, etc. My idea is to regularly update the kernel to include some awesome Python libraries which can real come in handy for the Data Analysis and Machine learning tasks.
tanaymukherjee/Working-With-Python-Functions