/Data-Science-Portfolio

Portfolio of data science projects completed by me for self learning.Looking for job opportunity.

Primary LanguageJupyter Notebook

Data Science Portfolio

Repository containing portfolio of data science projects completed by me for self learning. Presented in the form of jupyter Notebooks.

Note: Data used in the projects (accessed under data directory) is for demonstration purposes only.

Contents

As we know data scientists spend around 80% of their time on cleaning and preparing the data.This Airbnb booking data is quite messy and uncleaned.The data is cleaned and prepared in such a way that it can send to any further operation.

Tools:pandas,numpy,matplotlib

This project contains data about loan applicants as features and their loan status as target variable.The random forrest classifier is used with 80% accuracy. steps-

  1. Getting Libraries and data
  2. exploratory data analysis
  3. Filing the Missing values
  4. preprocessoring - hot encoding
  5. Random forrest model building
  6. Tunning the parameters

Tools:- seaborn ,matplotlib,missingno ,numpy,warnings,LabelEncoder,RandomForestClassifier

This project consist of 5000 movies imdb dateset in which their actor ,dircetor name ,genre plot and regarding imformation is given. Movie recommendation system is created by cosine similarity .The whole project breaken down into two parts 1.Data cleaning 2.Recommandation System

Tools:- cosine_similarity,CountVectorizer, pandas,numpy,missingno,seaborn

Forecasting is the process of estimating future sales. Data Time series analysis applied on Superstore's Sales data. Steps-

  1. Importing useful libraries
  2. Geting data and drop the unncessary columns
  3. Visualizing the series
  4. Test for Stationarity
  5. Decomposing
  6. plot ACF/PACF and Find Optimal Parameters
  7. Build ARIMA Model

Tools:- statistics ,statsmodels,pandas,numpy ,matplotlib

  • Web-Sracaping-of-internshala(Notebook)

This project web scrap all the posts of data science internships in nice tabular form.

Tools:- requests,BeautifulSoup,pandas

  • Demonetisation_sentiment_analysis(Notebook)

The Government of India announced the demonetisation of all ₹500 and ₹1000 banknotes.kaggle dataset of tweets during demonetisation is used to analysis sentiments.

Steps-

  1. Importing libraries and data.
  2. cleaning data.
  3. Label Each tweet as either positive review or negative.
  4. Plot sentiment graph.
  5. Wordcloud of sentiment.

Tools: -BeautifulSoup,plotly,nltk,wordcloud,matplotlib

Indian Actors Face Recognition is kaggle dataset .The dataset includes images of 5 famous Indian film stars. The dataset has been made smaller .The .zip files contain 5 directories and each of them has 30 images of actors.The second zip file contains validation images so that we can evaluate your model.

Steps : This project divided into two parts-

part1 -

  1. Getting libraries
  2. Renaming the filenames and save to one place
  3. Save filename with their corresponding label in pandas

part2-

  1. Importing libaries and label file
  2. Creating functions for image manipulation
  3. Prepare the data to feed into model
  4. Buliding the model
  5. Train the Model
  6. Test image on new images

Tools - numpy ,pandas,tqdm, cv2 , dlib ,matplotlib,keras,ImageDataGenerator

If you liked what you saw, want to have a chat with me about the portfolio, work opportunities, or collaboration, shoot an email at aamir.k306@yahoo.com