vaitybharati
Certified Data Scientist with 16+ years of cumulative experience; eager to leverage the machine learning, artificial intelligence and data science skills.
Thane
Pinned Repositories
Assignment-04-Simple-Linear-Regression-2
Assignment-04-Simple-Linear-Regression-2. Q2) Salary_hike -> Build a prediction model for Salary_hike Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization. Correlation Analysis. Model Building. Model Testing. Model Predictions.
Assignment-05-Multiple-Linear-Regression-2
Assignment-05-Multiple-Linear-Regression-2. Prepare a prediction model for profit of 50_startups data. Do transformations for getting better predictions of profit and make a table containing R^2 value for each prepared model. R&D Spend -- Research and devolop spend in the past few years Administration -- spend on administration in the past few years Marketing Spend -- spend on Marketing in the past few years State -- states from which data is collected Profit -- profit of each state in the past few years.
Assignment-1-Q24-Basic-Statistics-Level-1-
Q 24) A Government company claims that an average light bulb lasts 270 days. A researcher randomly selects 18 bulbs for testing. The sampled bulbs last an average of 260 days, with a standard deviation of 90 days. If the CEO's claim were true, what is the probability that 18 randomly selected bulbs would have an average life of no more than 260 days
Assignment-11-Text-Mining-01-Elon-Musk
Assignment-11-Text-Mining-01-Elon-Musk, Perform sentimental analysis on the Elon-musk tweets (Exlon-musk.csv), Text Preprocessing: remove both the leading and the trailing characters, removes empty strings, because they are considered in Python as False, Joining the list into one string/text, Remove Twitter username handles from a given twitter text. (Removes @usernames), Again Joining the list into one string/text, Remove Punctuation, Remove https or url within text, Converting into Text Tokens, Tokenization, Remove Stopwords, Normalize the data, Stemming (Optional), Lemmatization, Feature Extraction, Using BoW CountVectorizer, CountVectorizer with N-grams (Bigrams & Trigrams), TF-IDF Vectorizer, Generate Word Cloud, Named Entity Recognition (NER), Emotion Mining - Sentiment Analysis.
Bagging-boosting-stacking
Bagging-boosting-stacking
Decision-Tree
Decision-Tree
Forecasting_Model_based_methods
Splitting data into Linear Model, Exponential, Qaudratic, Additive seasonality , Additive Seasonality Quadratic , Multiplicative Seasonality, Multiplicative Additive Seasonality. Prediction for new time period
Logistic-Regression
Logistic-Regression
Multi-Linear-Reg
Multi-Linear-Reg
NN_Hyperparameter-Tuning
Tuning of Hyperparameters :- Batch Size and Epochs. Tuning of Hyperparameters:- Learning rate and Drop out rate. Tuning of Hyperparameters:- Activation Function and Kernel Initializer. Tuning of Hyperparameter :-Number of Neurons in activation layer. Training model with optimum values of Hyperparameters.
vaitybharati's Repositories
vaitybharati/Assignment-11-Text-Mining-01-Elon-Musk
Assignment-11-Text-Mining-01-Elon-Musk, Perform sentimental analysis on the Elon-musk tweets (Exlon-musk.csv), Text Preprocessing: remove both the leading and the trailing characters, removes empty strings, because they are considered in Python as False, Joining the list into one string/text, Remove Twitter username handles from a given twitter text. (Removes @usernames), Again Joining the list into one string/text, Remove Punctuation, Remove https or url within text, Converting into Text Tokens, Tokenization, Remove Stopwords, Normalize the data, Stemming (Optional), Lemmatization, Feature Extraction, Using BoW CountVectorizer, CountVectorizer with N-grams (Bigrams & Trigrams), TF-IDF Vectorizer, Generate Word Cloud, Named Entity Recognition (NER), Emotion Mining - Sentiment Analysis.
vaitybharati/Assignment-07-Clustering-Hierarchical-Airlines-
Assignment-07-Clustering-Hierarchical-Airlines. Perform clustering (hierarchical) for the airlines data to obtain optimum number of clusters. Draw the inferences from the clusters obtained. Data Description: The file EastWestAirlinescontains information on passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers.
vaitybharati/Assignment-1-Q20-Basic-Statistics-Level-1-
Data _set: Cars.csv Calculate the probability of MPG of Cars for the below cases. MPG <- Cars$MPG a. P(MPG>38) b. P(MPG<40) c. P (20<MPG<50)
vaitybharati/P27.-Supervised-ML---Multiple-Linear-Regression---Toyoto-Cars
Supervised-ML---Multiple-Linear-Regression---Toyota-Cars. EDA, Correlation Analysis, Model Building, Model Testing, Model Validation Techniques, Collinearity Problem Check, Residual Analysis, Model Deletion Diagnostics (checking Outliers or Influencers) Two Techniques : 1. Cook's Distance & 2. Leverage value, Improving the Model, Model - Re-build, Re-check and Re-improve - 2, Model - Re-build, Re-check and Re-improve - 3, Final Model, Model Predictions.
vaitybharati/P36.-Supervised-ML---Decision-Tree---C5.0-Entropy-Iris-Flower-
Supervised-ML-Decision-Tree-C5.0-Entropy-Iris-Flower-Using Entropy Criteria - Classification Model. Import Libraries and data set, EDA, Apply Label Encoding, Model Building - Building/Training Decision Tree Classifier (C5.0) using Entropy Criteria. Validation and Testing Decision Tree Classifier (C5.0) Model
vaitybharati/Assignment-06-Logistic-Regression
Assignment-06-Logistic-Regression. Output variable -> y y -> Whether the client has subscribed a term deposit or not Binomial ("yes" or "no") Attribute information For bank dataset Input variables: # bank client data: 1 - age (numeric) 2 - job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services") 3 - marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed) 4 - education (categorical: "unknown","secondary","primary","tertiary") 5 - default: has credit in default? (binary: "yes","no") 6 - balance: average yearly balance, in euros (numeric) 7 - housing: has housing loan? (binary: "yes","no") 8 - loan: has personal loan? (binary: "yes","no") # related with the last contact of the current campaign: 9 - contact: contact communication type (categorical: "unknown","telephone","cellular") 10 - day: last contact day of the month (numeric) 11 - month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec") 12 - duration: last contact duration, in seconds (numeric) # other attributes: 13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted) 15 - previous: number of contacts performed before this campaign and for this client (numeric) 16 - poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success") Output variable (desired target): 17 - y - has the client subscribed a term deposit? (binary: "yes","no") 8. Missing Attribute Values: None
vaitybharati/Assignment-08-PCA-Data-Mining-Wine-
Assignment-08-PCA-Data-Mining-Wine data. Perform Principal component analysis and perform clustering using first 3 principal component scores (both heirarchial and k mean clustering(scree plot or elbow curve) and obtain optimum number of clusters and check whether we have obtained same number of clusters with the original data (class column we have ignored at the begining who shows it has 3 clusters)
vaitybharati/P34.-Unsupervised-ML---t-SNE-Data-Mining-Cancer-
Unsupervised-ML-t-SNE-Data-Mining-Cancer. Import Libraries, Import Dataset, Convert data to array format, Separate array into input and output components, TSNE implementation, Cluster Visualization
vaitybharati/vaitybharati
Config files for my GitHub profile.
vaitybharati/Assignment-07-DBSCAN-Clustering-Crimes-
Assignment-07-DBSCAN-Clustering-Crimes. Perform Clustering for the crime data and identify the number of clusters formed and draw inferences.
vaitybharati/Assignment-07-K-Means-Clustering-Airlines-
Assignment-07-K-Means-Clustering-Airlines. Perform clustering (K means clustering) for the airlines data to obtain optimum number of clusters. Draw the inferences from the clusters obtained. The file EastWestAirlinescontains information on passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers.
vaitybharati/Assignment-09-Association-Rules-Data-Mining-Books-
Association-Rules-Data-Mining-Books. Apriori Algorithm, Association rules with 10% Support and 70% confidence, Association rules with 20% Support and 60% confidence, Association rules with 5% Support and 80% confidence, visualization of obtained rule.
vaitybharati/Assignment-09-Association-Rules-Data-Mining-Groceries-
Association Rules Data Mining (Groceries). Converting the data frame into a list of lists, Using Transactionencoder to transform this dataset into a logical data frame, Building the data frame: rows are logical and columns are the items that have been purchased, Print Column names, We need to drop nan column from the data frame, Most popular items, Top 10 Popular items, Barplot visualization of popular items, Apriori Algorithm: Association rules with 5% Support and 70% confidence, Association rules with 1% Support and 80% confidence, Visualization of obtained rule.
vaitybharati/Assignment-09-Association-Rules-Data-Mining-my_movies-
Assignment-09-Association-Rules-Data-Mining-my_movies. Apriori Algorithm. Association rules with 10% Support and 70% confidence. Association rules with 5% Support and 90% confidence. Lift Ratio > 1 is a good influential rule in selecting the associated transactions. Visualization of obtained rule.
vaitybharati/Assignment-10-Recommendation-System-Data-Mining-books-
Assignment-10-Recommendation-System-Data-Mining-books. Recommend a best book based on the ratings: Sort by User IDs, number of unique users in the dataset, number of unique books in the dataset, converting long data into wide data using pivot table, replacing the index values by unique user Ids, Impute those NaNs with 0 values, Calculating Cosine Similarity between Users on array data, Store the results in a dataframe format, Set the index and column names to user ids, Nullifying diagonal values, Most Similar Users, extract the books which userId 162107 & 276726 have watched, extract the books which userId 276729 & 276726 have watched.
vaitybharati/Assignment-11-Text-Mining-02-Amazon-Product-Reviews
NLP: Sentiment Analysis or Emotion Mining on Amazon Product Reviews - Part-1. Let’s learn the NLP techniques to perform Sentiment Analysis or Emotion Mining on extracted Product Reviews from Amazon. Part-1 covers Text preprocessing and Feature extraction, the next part covers Sentiment Analysis or Emotion Mining on text corpus. https://medium.com/@vaitybharati/nlp-sentiment-analysis-or-emotion-mining-on-amazon-product-reviews-part-1-428d43112027
vaitybharati/Assignment-11-Text-Mining-Amazon-Reviews-using-Scrapy
Text-Mining-Amazon-Reviews-using-Scrapy. Ever wondered? Life would be easier if there could be ways to know how well your product performs and what do people feel about your product? The Solution -Text Mining Techniques. https://medium.com/@vaitybharati/text-mining-how-to-extract-amazon-reviews-using-scrapy-5bd709cb826c
vaitybharati/P26.-Supervised-ML---Multiple-Linear-Regression---Cars-dataset
Supervised-ML---Multiple-Linear-Regression---Cars-dataset. Model MPG of a car based on other variables. EDA, Correlation Analysis, Model Building, Model Testing, Model Validation Techniques, Collinearity Problem Check, Residual Analysis, Model Deletion Diagnostics (checking Outliers or Influencers) Two Techniques : 1. Cook's Distance & 2. Leverage value, Improving the Model, Model - Re-build, Re-check and Re-improve - 2, Model - Re-build, Re-check and Re-improve - 3, Final Model, Model Predictions.
vaitybharati/P28.-Supervised-ML---Logistic-Regression---Appointing-Attorney-or-not
Supervised-ML---Logistic-Regression---Appointing-Attorney-or-not. EDA, Model Building, Model Predictions, Testing Model Accuracy, ROC Curve plotting and finding AUC value.
vaitybharati/P29.-Unsupervised-ML---Hierarchical-Clustering-Univ.-
Unsupervised-ML---Hierarchical-Clustering-University Data. Import libraries, Import dataset, Create Normalized data frame (considering only the numerical part of data), Create dendrograms, Create Clusters, Plot Clusters.
vaitybharati/P30.-Unsupervised-ML---K-Means-Clustering-Non-Hierarchical-Clustering-Univ.-
Unsupervised-ML---K-Means-Clustering-Non-Hierarchical-Clustering-Univ. Use Elbow Graph to find optimum number of clusters (K value) from K values range. The K-means algorithm aims to choose centroids that minimise the inertia, or within-cluster sum-of-squares criterion WCSS. Plot K values range vs WCSS to get Elbow graph for choosing K (no. of clusters)
vaitybharati/P31.-Unsupervised-ML---DBSCAN-Clustering-Wholesale-Customers-
Unsupervised-ML---DBSCAN-Clustering-Wholesale-Customers. Import Libraries, Import Dataset, Normalize heterogenous numerical data using standard scalar fit transform to dataset, DBSCAN Clustering, Noisy samples are given the label -1, Adding clusters to dataset.
vaitybharati/P32.-Unsupervised-ML---Association-Rules-Data-Mining-Titanic-
Unsupervised-ML---Association-Rules-Data-Mining-Titanic. Data Preprocessing: As the data is categorical format, we are using One Hot Encoding to convert into numerical format. Apriori Algorithm: frequent item sets & association rules. A leverage value of 0 indicates independence. Range will be [-1 1]. A high conviction value means that the consequent is highly depending on the antecedent and range [0 inf]. Lift Ratio > 1 is a good influential rule in selecting the associated transactions.
vaitybharati/P33.-Unsupervised-ML---PCA-Data-Mining-Univ-
Unsupervised-ML---PCA-Data-Mining-Univ. Import Dataset, Converting data to numpy array, Normalizing the numerical data, Applying PCA Fit Transform to dataset, PCA Components matrix or covariance Matrix, Variance of each PCA, Final Dataframe, Visualization of PCAs, Eigen vector and eigen values for a given matrix.
vaitybharati/P35.-Unsupervised-ML---Recommendation-System-Data-Mining-Movies-
Unsupervised-ML-Recommendation-System-Data-Mining-Movies. Recommend movies based on the ratings: Sort by User IDs, number of unique users in the dataset, number of unique movies in the dataset, Impute those NaNs with 0 values, Calculating Cosine Similarity between Users on array data, Store the results in a dataframe format, Set the index and column names to user ids, Slicing first 5 rows and first 5 columns, Nullifying diagonal values, Most Similar Users, extract the movies which userId 6 & 168 have watched.
vaitybharati/Assignment-12-Naives-Bayes-Classifier-Salary-
vaitybharati/Assignment-13-KNN-K-Nearest-Neighbors-Glass-
vaitybharati/Assignment-13-KNN-K-Nearest-Neighbors-Zoo-
vaitybharati/Assignment-18-Time-Series-Analysis-Forecasting-Airlines-Passengers-
vaitybharati/Assignment-18-Time-Series-Analysis-Forecasting-CocaCola-Prices-