Tech-with-Vidhya
Hello, Welcome to my github portfolio page. It includes all the Data Science, Machine Learning Engineering, NLP, GenAI, LLMs and Big Data Engineering Projects
AI/ML/Data Engineer & Solutions ArchitectMars | Queen Mary University of London | UK
Pinned Repositories
Automated_ETL_Finance_Data_Pipeline_with_AWS_Lambda_Spark_Transformation_Job_Python
This project covers the implementation of building an automated ETL data pipeline using Python and AWS Services with Spark transformation job for financial stocks trade transactions. The ETL Data Pipeline is automated using AWS Lambda Function with a Trigger defined. Whenever a new file is ingested into the AWS S3 Bucket; then the AWS Lambda Function gets triggered and will implement the further action to execute the AWS Glue Crawler ETL Spark Transformation Job. The Spark Transformation Job implemented using Python PySpark transforms the trade transactions data stored in the AWS S3 Bucket; further to filter a sub-set of trade transactions for which the total number of shares transacted are less than or equal to 100. Tools & Technologies: Python, Boto3, PySpark, SDK, AWS CLI, AWS Virtual Private Cloud (VPC), AWS VPC Endpoint, AWS S3, AWS Glue, AWS Glue Crawler, AWS Glue Jobs, AWS Athena, AWS Lambda, Spark
AWS_SageMaker_TensorFlow_Keras_CNN_Model_Fashion_MNIST
This is an AWS SageMaker TensorFlow Keras CNN Machine Learning Project.
bank_customers_churn_prediction_exploring_7_different_classification_algorithms
This project deals with the classification of the bank customers on whether a customer will leave the bank (i.e.; churn) or not, by applying the below steps of a Data Science Project Life-Cycle 1. Data Exploration, Analysis and Visualisations 2. Data Pre-processing 3. Data Preparation for the Modelling 4. Model Training 5. Model Validation 6. Optimized Model Selection based on Various Performance Metrics 7. Deploying the Best Optimized Model into Unseen Test Data 8. Evaluating the Optimized Model’s Performance Metrics The business case of determining the churn status of bank customers are explored, trained and validated on 7 different classification algorithms/models as listed below and the best optimized model is selected based on the accuracy metrics. 1. Decision Tree Classifier - CART (Classification and Regression Tree) Algorithm 2. Decision Tree Classifier - IDE (Iterative Dichotomiser) Algorithm 3. Ensemble Random Forest Classifier Algorithm 4. Ensemble Adaptive Boosting Classifier Algorithm 5. Ensemble Hist Gradient Boosting Classifier Algorithm 6. Ensemble Extreme Gradient Boosting (XGBoost) Classifier Algorithm 7. Support Vector Machine (SVM) Classifier Algorithm
Bitcoin_Network_Analytics_using_Python_NetworkX_and_Gephi
This group project of 4 members is delivered as part of my Masters in Big Data Science (MSc BDS) Program Module named “Digital Media and Social Network” in Queen Mary University of London (QMUL), London, United Kingdom. This project covers the network analysis covering 4 different problem statements and use cases using python NetworkX package, Gephi network analysis tool and Microsoft excel. Dataset: Dataset includes Bitcoin Trade Transactions for the period between 2011 to 2016. Dataset Representation: Bitcoin Trade Transactions -> Attributes (Rater, Ratee, Rating and Timestamp) Network Formation: For every trade transaction between 2 users in the Bitcoin Network; ratings are recorded and tracked in the system with the corresponding timestamp (Directed Network). Size of the Dataset and Network: Users/Nodes = 5881 Transactions/Edges = 35592 Ratings (in the range of -10 to +10; where -10 represents the least rating and +10 represents the highest rating) Basic Network Statistics: Use Cases and Objectives:
credit-risk-assessment-fintech-framework-using-deep-learning-and-transfer-learning
This project represents the credit risk assessment dual framework of predicting credit scores and the forecasts of credit default risk of the consumers of the financial institutions like commercial banks and lending firms. The implementation is dealt that mimics the real-world FICO Scoring Model with the custom enhancements to include lender's internal credit risk factors by proposing a new Domain-Tech Feature Selection Approach along with Deep Learning and Transfer Learning techniques. This is the masters final project delivered as part of my course of studying Masters in Big Data Science program at Queen Mary University of London (QMUL), United Kingdom.
ETL_Finance_Data_Pipeline_Python_AWS_CLI_S3_Glue_Athena
This project covers the implementation of building a ETL data pipeline using Python and AWS Services for financial stocks trade transactions. Tools & Technologies: Python, Boto3 SDK, AWS CLI, AWS Virtual Private Cloud (VPC), AWS VPC Endpoint, AWS S3, AWS Glue, AWS Glue Crawler, AWS Athena, AWS Redshift
MLOps_AWS_Kubernetes_LoadBalancing_Docker_Flask_Banking_Customers_Digital_Transformation_Classifier
This is an AWS MLE and MLOps Bank Customers Digital Transformation Project.
NLP_Multi-Class_Text_Classification_using_BERT_Model
NLP_Text_Classification_with_Transformers_RoBERTa_and_XLNet_Models
productionized_docker_ML_model_application_into_kubernetes_cluster_using_AWS_EKS_CloudFormation_EMR
This project covers the end to end implementation of deploying and productionizing a dockerized/containerized machine learning python flask application into Kubernetes Cluster using the AWS Elastic Kubernetes Service (EKS), AWS Serverless Fargate Instances, AWS CloudFormation Cloud Stack and AWS Elastic Container Registry (ECR) Service. The machine learning business case implemented in this project includes a bank note authentication binary classifier model using Random Forest Classifier; which predicts and classifies a bank note either as a Fake Bank Note (Label 0) or a Genuine Bank Note (Label 1). Implementation Steps: 1. Creation of an end to end machine learning solution covering all the ML life-cycle steps of Data Exploration, Feature Selection, Model Training, Model Validation and Model Testing on the unseen production data. 2. Saved the finalised model as a pickle file. 3. Creation of a Python Flask based API; in order to render the ML model solution and inferences to the end-users. 4. Verified and tested the working status of the Python Flask API in the localhost set-up. 5. Creation of a Docker File (containing the steps/instructions to create a docker image) for the Python Flask based Bank Note Authentication Machine Learning Application embedded with Random Forest ML Classifier Model. 6. Creation of IAM Service Roles with appropriate policies to access the AWS Elastic Container Registry (ECR) Service and AWS Elastic Kubernetes Service (EKS) and AWS CloudFormation Service. 7. Created a new EC2 Linux Server Instance in AWS and copied the web application project’s directories and its files into the AWS Linux Server using SFTP linux commands. 8. Installed the Docker software and the supporting python libraries in the AWS EC2 Linux Server Instance; as per the “requirements.txt” file. 9. Transformation of the Docker File into a Docker Image and Docker Container representing the application; using docker build and run commands. 10. Creation of a Docker Repository within the AWS ECR Service and pushed the application docker image into the repository using AWS Command Line Interface (CLI) commands. 11. Creation of the Cloud Stack with private and public subnets using the AWS CloudFormation Service with appropriate IAM roles and policies. 12. Creation of the Kubernetes Cluster using the AWS EKS Service with appropriate IAM roles and policies and linked the cloud stack created using the AWS CloudFormation Service. 13. Creation of the AWS Serverless Fargate Profile and Fargate instances/nodes. 14. Creation and configured the “Deployment.yaml” and “Service.yaml” files using the Kubernetes kubectl commands. 15. Applied the “Deployment.yaml” with pods replica configuration to the AWS EKS Cluster Fargate Nodes; using the Kubernetes kubectl commands. 16. Applied the “Service.yaml” using the Kubernetes kubectl commands; to render and service the machine learning application to the end-users for public access with the creation of the production end-point. 17. Verified and tested the inferences of the productionized ML Application using the AWS Fargate end-point created in the AWS Kubernetes EKS Cluster. Tools & Technologies: Python, Flask, AWS, AWS EC2, Linux Server, Linux Commands, Command Line Interface (CLI), Docker, Docker Commands, AWS ECR, AWS IAM, AWS CloudFormation, AWS EKS, Kubernetes, Kubernetes kubectl Commands.
Tech-with-Vidhya's Repositories
Tech-with-Vidhya/bank_customers_churn_prediction_exploring_7_different_classification_algorithms
This project deals with the classification of the bank customers on whether a customer will leave the bank (i.e.; churn) or not, by applying the below steps of a Data Science Project Life-Cycle 1. Data Exploration, Analysis and Visualisations 2. Data Pre-processing 3. Data Preparation for the Modelling 4. Model Training 5. Model Validation 6. Optimized Model Selection based on Various Performance Metrics 7. Deploying the Best Optimized Model into Unseen Test Data 8. Evaluating the Optimized Model’s Performance Metrics The business case of determining the churn status of bank customers are explored, trained and validated on 7 different classification algorithms/models as listed below and the best optimized model is selected based on the accuracy metrics. 1. Decision Tree Classifier - CART (Classification and Regression Tree) Algorithm 2. Decision Tree Classifier - IDE (Iterative Dichotomiser) Algorithm 3. Ensemble Random Forest Classifier Algorithm 4. Ensemble Adaptive Boosting Classifier Algorithm 5. Ensemble Hist Gradient Boosting Classifier Algorithm 6. Ensemble Extreme Gradient Boosting (XGBoost) Classifier Algorithm 7. Support Vector Machine (SVM) Classifier Algorithm
Tech-with-Vidhya/capital_markets_stocks_trade_transactions_tableau_dashboard
This project includes the data analysis related to the capital stock market trade transactions data using Tableau Desktop and results are visualized in the form of a Dynamic Tableau Dashboard.
Tech-with-Vidhya/financial_consumer_complaints_tableau_dashboard
This project includes the data analysis related to financial consumer complaints using Tableau Desktop and results are visualized in the form of a Tableau Dashboard.
Tech-with-Vidhya/spark_fund_investment_EDA_and_data_cleaning_with_profiling_report
This project deals with the Exploratory Data Analysis and the Data Cleaning of the various Companies Data for Spark Fund Investment, along with the Profiling Report.
Tech-with-Vidhya/audio-digits-classification-using-MFCC-and-convolutional-neural-network
This project is delivered as part of my Masters in Big Data Science (MSc BDS) Program Module named “Machine Learning” in Queen Mary University of London (QMUL), London, United Kingdom. The project cover the basic solution and the Advanced Solution as given below based on Audio Feature Extraction Method named "Mel-frequency cepstral coefficients (MFCC)" and Deep Learning Convolutional Neural Network (CNN). Basic Solution: Includes designing, building, training, validation and testing a model created to recognise numerals from 0 to 9 in the audio files. Advanced Solution: Includes implementing the solution to predict the numeral based on a new audio test file. This model's solution can be applied to a Banking Application/Product and can be used for predicting a 4-digit passcode said by an authorised customer during on-call verification as part of login process to Internet Banking Account. NOTE: Due to the data privacy and the data protection policy to be adhered by the students; the datasets and the solution related code are not exposed and updated in the GitHub public profile; in order to be compliant with the Queen Mary University of London (QMUL) policies.
Tech-with-Vidhya/bank-customers-churn-prediction-using-decision-tree-classifier-cart-algorithm
This project deals with the classification of the bank customers on whether a customer will leave the bank (i.e.; churn) or not, using Decision Tree CART (Classification And Regression Tree) Algorithm.
Tech-with-Vidhya/bank-customers-churn-prediction-using-decision-tree-classifier-ide-algorithm
This project deals with the classification of the bank customers on whether a customer will leave the bank (i.e.; churn) or not, using Decision Tree ID3 (Iterative Dichotomiser) Algorithm.
Tech-with-Vidhya/bank-customers-churn-prediction-using-ensemble-adaptive-boosting-classifier-algorithm
This project deals with the classification of the bank customers on whether a customer will leave the bank (i.e.; churn) or not, using Ensemble Adaptive Boosting Classifier Algorithm.
Tech-with-Vidhya/bank-customers-churn-prediction-using-ensemble-extreme-gradient-boosting-classifier-algorithm
This project deals with the classification of the bank customers on whether a customer will leave the bank (i.e.; churn) or not, using Ensemble Extreme Gradient Boosting (XGBoost) Classifier Algorithm.
Tech-with-Vidhya/bank-customers-churn-prediction-using-ensemble-hist-gradient-boosting-classifier-algorithm
This project deals with the classification of the bank customers on whether a customer will leave the bank (i.e.; churn) or not, using Ensemble Hist Gradient Boosting Classifier Algorithm.
Tech-with-Vidhya/bank-customers-churn-prediction-using-ensemble-random-forest-classifier-algorithm
This project deals with the classification of the bank customers on whether a customer will leave the bank (i.e.; churn) or not, using Ensemble Random Forest Classifier Algorithm.
Tech-with-Vidhya/bank-customers-churn-prediction-using-support-vector-machine-classifier-algorithm
This project deals with the classification of the bank customers on whether a customer will leave the bank (i.e.; churn) or not, using Support Vector Machine (SVM) Classifier Algorithm.
Tech-with-Vidhya/Colored-Text-in-Python
This project will display texts and strings in coloured format with various foreground colours, background colurs and styles.
Tech-with-Vidhya/Display-Right-Angled-Triangle
This project will display a right-angled triangle as an output in various symbols namely:"*", "|" and ".".
Tech-with-Vidhya/image-super-resolution-using-deep-learning-CNN-and-PSNR
This project is delivered as part of my Masters in Big Data Science (MSc BDS) Program internal training for the module named “Deep Learning and Computer Vision” in Queen Mary University of London (QMUL), London, United Kingdom. This project aims to obtain practical knowledge and hands-on understanding of the concepts of image super-resolution, deep learning using convolutional neural networks (CNN) and peak signal-to-noise ratio (PSNR). The project addresses 3 different problem statements and use cases; with the solutions implemented using Python and its module named PyTorch.
Tech-with-Vidhya/ML-Predicting-Company-Profit-Linear-Regression-Model-without-scikitlearn
This repo includes a Simple Linear Regression Machine Learning Model of predicting a company's profit based on the "R&D Spend" amount; with a dataset that includes 1000 companies with labelled data.
Tech-with-Vidhya/ML-Salary-Prediction-Linear-Regression-Model-Project
This repo includes a Linear Regression Machine Learning Model of predicting salary based on the years of experience; with a dataset that includes labelled data.
Tech-with-Vidhya/NLP-corpus-analysis-distributional-semantics-ngram-CBOW-skipgram-word2vec
This project is delivered as part of my Masters in Big Data Science (MSc BDS) Program for the module named “Natural Language Processing” in Queen Mary University of London (QMUL), London, United Kingdom. This project covers the distributional semantics to investigate how some words in the English language changed over the course of the last two centuries 2000 and 2010 using a private Corpus of Historical American English (COHA) dataset. The project explores and implements various sampling methods as listed below: 1. N-Gram 2. CBOW 3. SkipGram 4. Word2Vec **NOTE:** Due to the data privacy and the data protection policy to be adhered by the students; the datasets and the solution related code are not exposed and updated in the GitHub public profile; in order to be compliant with the Queen Mary University of London (QMUL) policies.
Tech-with-Vidhya/NLP-deception-detection-classification-tfidf-amazon-reviews
This project is delivered as part of my Masters in Big Data Science (MSc BDS) Program for the module named “Natural Language Processing” in Queen Mary University of London (QMUL), London, United Kingdom. This project covers the implementation of the classification for Deception Detection of the Amazon text reviews private dataset using TF-IDF (Term Frequency–Inverse Document Frequency) **NOTE:** Due to the data privacy and the data protection policy to be adhered by the students; the datasets and the solution related code are not exposed and updated in the GitHub public profile; in order to be compliant with the Queen Mary University of London (QMUL) policies.
Tech-with-Vidhya/Python-Data-Analysis-Pokemon-Pandas
This project includes the data analysis of pokemon dataset using python library "pandas".
Tech-with-Vidhya/Python-Data-Visualisations-Matplotlib
This project includes data visualisations using python library "Matplotlib" with "pyplot".
Tech-with-Vidhya/R-programming-statistical-data-analysis-and-visualisations
This project is delivered as part of my Masters in Big Data Science (MSc BDS) Program for the module named “Applied Statistics” in Queen Mary University of London (QMUL), London, United Kingdom. The project covered the descriptive statistical analysis, data analysis and visualisations in R programming for the “njgolf” dataset. **NOTE:** Due to the data privacy and the data protection policy to be adhered by the students; the datasets and the solution related code are not exposed and updated in the GitHub public profile; in order to be compliant with the Queen Mary University of London (QMUL) policies.
Tech-with-Vidhya/R-programming-statistical-modelling-of-linear-regression
This project is delivered as part of my Masters in Big Data Science (MSc BDS) Program for the module named “Applied Statistics” in Queen Mary University of London (QMUL), London, United Kingdom. The project covered the statistical modelling of linear regression machine learning algorithm implemented in R programming for the “njgolf” dataset. **NOTE:** Due to the data privacy and the data protection policy to be adhered by the students; the datasets and the solution related code are not exposed and updated in the GitHub public profile; in order to be compliant with the Queen Mary University of London (QMUL) policies.
Tech-with-Vidhya/stock_market_status_summary_tableau_dashboard
This project includes the data analysis related to stock market data using Tableau Desktop and results are visualized in the form of a Tableau Dashboard.
Tech-with-Vidhya/Sudoku-Puzzle-Game-Solver
This project will solve a given Sudoko Puzzle Game by itself and displays the unsolved puzzle first followed by displaying the solved puzzle as outputs in the console.
Tech-with-Vidhya/twitter-time-series-data-linear-and-quadratic-regression-in-R-programming
This project is delivered as part of my Masters in Big Data Science (MSc BDS) Program for the module named “Applied Statistics” in Queen Mary University of London (QMUL), London, United Kingdom. The project covered the statistical modelling of the linear regression and quadratic regression machine learning algorithms implemented in R programming for the “twitter time series” dataset. **NOTE:** Due to the data privacy and the data protection policy to be adhered by the students; the datasets and the solution related code are not exposed and updated in the GitHub public profile; in order to be compliant with the Queen Mary University of London (QMUL) policies.
Tech-with-Vidhya/twitter-time-series-data-various-statistical-distributions-analysis-R-programming
This project is delivered as part of my Masters in Big Data Science (MSc BDS) Program for the module named “Applied Statistics” in Queen Mary University of London (QMUL), London, United Kingdom. The project covered the statistical analysis of the various distributions namely normal, logistic, poisson, Weibull and gamma in R programming for the “twitter time series” dataset. **NOTE:** Due to the data privacy and the data protection policy to be adhered by the students; the datasets and the solution related code are not exposed and updated in the GitHub public profile; in order to be compliant with the Queen Mary University of London (QMUL) policies.