Data Science Portfolio

Repository containing portfolio of data science projects completed by me for self learning. Presented in the form of jupyter Notebooks.

Note: Data used in the projects (accessed under data directory) is for demonstration purposes only.

Data-Cleaning-Airbnb(Notebook)

As we know data scientists spend around 80% of their time on cleaning and preparing the data.This Airbnb booking data is quite messy and uncleaned.The data is cleaned and prepared in such a way that it can send to any further operation.

Tools:pandas,numpy,matplotlib

Loan-Prediction(Notebook)

This project contains data about loan applicants as features and their loan status as target variable.The random forrest classifier is used with 80% accuracy. steps-

Getting Libraries and data
exploratory data analysis
Filing the Missing values
preprocessoring - hot encoding
Random forrest model building
Tunning the parameters

Tools:- seaborn ,matplotlib,missingno ,numpy,warnings,LabelEncoder,RandomForestClassifier

Movie-Recommandation-system (Notebook part1-Notebook part2)

This project consist of 5000 movies imdb dateset in which their actor ,dircetor name ,genre plot and regarding imformation is given. Movie recommendation system is created by cosine similarity .The whole project breaken down into two parts 1.Data cleaning 2.Recommandation System

Tools:- cosine_similarity,CountVectorizer, pandas,numpy,missingno,seaborn

Time-series-Analysis(Notebook)

Forecasting is the process of estimating future sales. Data Time series analysis applied on Superstore's Sales data. Steps-

Importing useful libraries
Geting data and drop the unncessary columns
Visualizing the series
Test for Stationarity
Decomposing
plot ACF/PACF and Find Optimal Parameters
Build ARIMA Model

Tools:- statistics ,statsmodels,pandas,numpy ,matplotlib

Web-Sracaping-of-internshala(Notebook)

This project web scrap all the posts of data science internships in nice tabular form.

Tools:- requests,BeautifulSoup,pandas

Demonetisation_sentiment_analysis(Notebook)

The Government of India announced the demonetisation of all ₹500 and ₹1000 banknotes.kaggle dataset of tweets during demonetisation is used to analysis sentiments.

Steps-

Importing libraries and data.
cleaning data.
Label Each tweet as either positive review or negative.
Plot sentiment graph.
Wordcloud of sentiment.

Tools: -BeautifulSoup,plotly,nltk,wordcloud,matplotlib

CNN-Facial-recognition-keras(Notebook part1-Notebook part2)

Indian Actors Face Recognition is kaggle dataset .The dataset includes images of 5 famous Indian film stars. The dataset has been made smaller .The .zip files contain 5 directories and each of them has 30 images of actors.The second zip file contains validation images so that we can evaluate your model.

Steps : This project divided into two parts-

part1 -

Getting libraries
Renaming the filenames and save to one place
Save filename with their corresponding label in pandas

part2-

Importing libaries and label file
Creating functions for image manipulation
Prepare the data to feed into model
Buliding the model
Train the Model
Test image on new images

Tools - numpy ,pandas,tqdm, cv2 , dlib ,matplotlib,keras,ImageDataGenerator

If you liked what you saw, want to have a chat with me about the portfolio, work opportunities, or collaboration, shoot an email at aamir.k306@yahoo.com

aamir306/Data-Science-Portfolio

Data Science Portfolio

Contents

Data-Cleaning-Airbnb(Notebook)

Loan-Prediction(Notebook)

Movie-Recommandation-system (Notebook part1-Notebook part2)

Time-series-Analysis(Notebook)

Web-Sracaping-of-internshala(Notebook)

Demonetisation_sentiment_analysis(Notebook)

CNN-Facial-recognition-keras(Notebook part1-Notebook part2)