/data-science-portfolio

Repository containing portfolio of data-science projects completed by me for academic and self-learning purposes.

Primary LanguageJupyter NotebookMIT LicenseMIT

Data-Science Portfolio

Repository containing portfolio of data-science projects completed by me for academic and self-learning purposes.

Note: Data used in the projects (provided in respective folders) is for demonstration purposes only.

Contents

Custom Implementation of ML-algos

Data Analysis and Visualization

Tools: Seaborn, Plotly, Matplotlib etc.

Kaggle Datasets

Machine Learning

  • Flight Price Prediction (END-to-END) : A model to predict the value of a given house in the real estate market using various statistical analysis tools & regression techniques.

  • Advance-House Price Prediction : A model to predict the value of a given house in the real estate market using various statistical analysis tools & regression techniques.

  • Breast-Cancer Prediction : Testing out several different supervised learning algorithms to build a model that accurately classifies the tumor into malignant OR benign.

  • Credit Card Customer Segmentation : Identifying different segments in the existing customers based on their spending patterns as well as past interactions with the bank.

  • Data-Scientist Salary Prediction : Creating a machine-learning model to predict the salary of a data-scientist using various ensemble techniques like Random-Forest & Gradient-Boosting.

  • Diabetes Classification : A binary-classification problem where it needs to be analyzed whether a patient is suffering from diabetes or not on the basis of many available features in the dataset.

  • Diamonds : Performing EDA and predicting price of diamonds with the help of statistical analysis tools ,regression techniques & hyperparameter tuning.

  • Heart Disease Prediction : Finding trends in heart data to predict certain cardiovascular events or any clear indications of heart health.

  • IPL-First-Innings Score Prediction : A model to predict the first innings score in the IPL using various statistical analysis tools & regression techniques.

  • Income-Gender Classification : Creating a machine-learning model to predict whether a person makes <=50k or >50k annually on the basis of available information in the dataset.

  • Laptop Price Prediction : Preparing a machine learning model to predict the price of a laptop given its configurations using various regression techniques.

  • Mall-Customer Segmentation : Analyzing a dataset containing data on various customer's for gaining customer insight and figuring out strategies for these customers to increase sales.

  • Messy-vs-Clean Room : Image classification problem --> classifying room as messy or clean.

  • Real-vs-Fake Jobs : Creating a classification model that uses text-data features and meta-features to predict which job descriptions are fraudulent.

  • Real-vs-Fake News : Developing a machine learning model to detect opinion spams and fake news using text classification.

  • Spam Email Detection : A binary-classification problem to classify given email as spam or not using various NLP techniques.

  • Student Performance in Exams : Performing EDA & predicting student's marks to understand the influence of the parents background, test preparation etc. on students performance.

  • Wine Quality Prediction : A model to predict the quality of wine (from 1-10) by analyzing the amount of various chemicals present in wine and their effect on it's quality.

Tools: scikit-learn,Numpy, Pandas, Seaborn, Matplotlib etc.

A/B Testing

  • A/B testing on Advertisement Data

    • Involves A/B testing on an advertisement dataset, examining the impact of a new distribution strategy on ad success rates.
    • Statistical hypothesis testing is employed to compare control and exposed groups, aiming to determine the effectiveness of the new design strategy.

Natural Language Processing

Tools: NLTK, scikit-learn etc.

ML-algorithms

Basic Implementation

Micro projects (Algos)

  • Decision Tree + Random Forest : Using Decision Tree and Random Forest to predict whether a lender will pay their loan back.

  • KNN : Using KNN to classify instances from a fake dataset into two target classes, while choosing the best value for K using the elbow method.

  • Linear Regression :

    1. Using Linear Regression to help a company decide whether to focus their efforts on their mobile app experience or their website, depending on which one of them has the greater impact.
    2. Using Linear Regression to predict the salary of a person based on their years of experience.  
  • Logistic Regression : Using Logistic Regression to predict whether an internet user clicked an ad or not.

  • SVM : Using Support Vector Machine to work on classification of the Iris dataset into different categories.

Tools: scikit-learn,Numpy, Pandas, Seaborn, Matplotlib etc.

Support my work

Do ⭐ the repository, if it inspired you, gave you ideas for your own portfolio or helped you in any way.