mirugwe1
A Data Scientist with research interests in Machine Learning, Deep Learning, Computer Vision, and NLP.
Makerere University School of Public HealthKampala, Uganda
Pinned Repositories
Accurate-occupancy-detection-of-an-office-room-from-light-temperature-humidity-and-CO2-measurement
This project aims at developing, validating, and testing several classification statistical models that could predict whether or not an office room is occupied using several data features, namely temperature (◦C), light (lx), humidity (%), CO2 (ppm), and a humidity ratio. The data is modeled using classification techniques i.e. Logistic regression, Classification tree, Bagging-Random forest, and Gradient boosted trees. These models were trained and then after evaluated against validation and test sets and using confusion matrices to obtain classification and misclassification rates. The logistic model was trained using glmnet R package, Tree package for classification tree model, randomForest for both Bagging and Random Forest Models, and gbm package for Gradient Boosted Model. The best accuracy was obtained from the Random Forest Model with a classification rate of 93.21% when it was evaluated against the test set. Light sensor is also the most significant variable in predicting whether the office room is occupied or not, this was observed in all the five models.
BentoML_MLOps
# HIV Viral Load Predictive Model with BentoML for MLOps
bird_detection
This repository hosts all the scripts used in the implementation of bird detection models. We are using Convolutional Neural Networks(CNN)'s Faster R-CNN, Single Shot Detector(SSD), and YOLOv3 meta-architectures while utilizing ResNet-101, MobileNet, Inception ResNet v2 and VGG-16 feature extraction Networks (backbone network).
Cervical_Cancer_Screening
This repository contains the implementation of deep learning Convolutional Neural Network (CNN) algorithms for cervical cancer screening. The algorithm aims to assist in the early detection and classification of cervical cancer from digital cervical images.
COVID19
Covid19_Data_Analysis
Aminated graps and maps of COVID19 Data.
Custering-Analysis-in-R
This assignment aims at discovering whether there exist any regional patterns in the spread of the COVID-19 virus through the use of cluster analysis statistical modeling on the countries COVID data collected from the Our World Data website. The dataset used in this assignment has 30 variables related to COVID-19 cases for 208 different countries. The data was collected from the start of the pandemic to $02^{nd}/09/2020$. The clustering analysis was done using R Programming Language and cluster statistical learning algorithms of Hierarchical clustering, Kmeans, and Partitioning Around Mediods(PAM). A model of six(6) clusters was built and silhouette plots were used to assess the quality of the clustering. And the hierarchical clustering model produced the highest average silhouette width of $\color{red}{\text{0.85}}$. And since different countries on the same continent have been affected differently by the virus, therefore in this regard clustering models couldn't group countries regionally. Countries were clustered depending on how they have been hit by the coronavirus pandemic.
Data_Protection
The repository contains code for AES AND RSA data encryption algorithm
Deep-Learning-Using-R-Keras
This assignment involves the use of Keras Tensorflow based R package to build multiple models on regression and classification data.
Recommendation-Systems
The goal of this project was to build recommender systems that predict the rating a user will give to a book and also recommends books to users that they might enjoy, based on their past book evaluations using content-based systems i.e. item-based collaborative filtering, user-based collaborative filtering, and matrix factorization. The accuracy of the matrix factorization recommender system was assessed using cross-validation. These content-based systems recommend books to users based on the cosine similarity distance between books or users. In User-Based Collaborative Filtering (UBCF), books are recommended assuming that users with similar preferences will rate books similarly. In Item-Based Collaborative Filtering (IBCF), the presumption is that users will prefer books that are similar to other items they like. Information about users and books was stored in a matrix that was modeled and used to make predictions (the recommendations). The matrix factorization recommender system assessed to find the influence of adding L2 Regularization and bias to it. And it was found that L2 regularization did not improve the performance of the model while adding the bias greatly improved the performance and the lowest RMSE of **0.033** was registered. Finally, a model that ensembles the predictions from UBCF, IBCF, and matrix factorization was created and evaluated using the RMSE.
mirugwe1's Repositories
mirugwe1/Cervical_Cancer_Screening
This repository contains the implementation of deep learning Convolutional Neural Network (CNN) algorithms for cervical cancer screening. The algorithm aims to assist in the early detection and classification of cervical cancer from digital cervical images.
mirugwe1/Recommendation-Systems
The goal of this project was to build recommender systems that predict the rating a user will give to a book and also recommends books to users that they might enjoy, based on their past book evaluations using content-based systems i.e. item-based collaborative filtering, user-based collaborative filtering, and matrix factorization. The accuracy of the matrix factorization recommender system was assessed using cross-validation. These content-based systems recommend books to users based on the cosine similarity distance between books or users. In User-Based Collaborative Filtering (UBCF), books are recommended assuming that users with similar preferences will rate books similarly. In Item-Based Collaborative Filtering (IBCF), the presumption is that users will prefer books that are similar to other items they like. Information about users and books was stored in a matrix that was modeled and used to make predictions (the recommendations). The matrix factorization recommender system assessed to find the influence of adding L2 Regularization and bias to it. And it was found that L2 regularization did not improve the performance of the model while adding the bias greatly improved the performance and the lowest RMSE of **0.033** was registered. Finally, a model that ensembles the predictions from UBCF, IBCF, and matrix factorization was created and evaluated using the RMSE.
mirugwe1/Accurate-occupancy-detection-of-an-office-room-from-light-temperature-humidity-and-CO2-measurement
This project aims at developing, validating, and testing several classification statistical models that could predict whether or not an office room is occupied using several data features, namely temperature (◦C), light (lx), humidity (%), CO2 (ppm), and a humidity ratio. The data is modeled using classification techniques i.e. Logistic regression, Classification tree, Bagging-Random forest, and Gradient boosted trees. These models were trained and then after evaluated against validation and test sets and using confusion matrices to obtain classification and misclassification rates. The logistic model was trained using glmnet R package, Tree package for classification tree model, randomForest for both Bagging and Random Forest Models, and gbm package for Gradient Boosted Model. The best accuracy was obtained from the Random Forest Model with a classification rate of 93.21% when it was evaluated against the test set. Light sensor is also the most significant variable in predicting whether the office room is occupied or not, this was observed in all the five models.
mirugwe1/BentoML_MLOps
# HIV Viral Load Predictive Model with BentoML for MLOps
mirugwe1/bird_detection
This repository hosts all the scripts used in the implementation of bird detection models. We are using Convolutional Neural Networks(CNN)'s Faster R-CNN, Single Shot Detector(SSD), and YOLOv3 meta-architectures while utilizing ResNet-101, MobileNet, Inception ResNet v2 and VGG-16 feature extraction Networks (backbone network).
mirugwe1/COVID19
mirugwe1/Covid19_Data_Analysis
Aminated graps and maps of COVID19 Data.
mirugwe1/Custering-Analysis-in-R
This assignment aims at discovering whether there exist any regional patterns in the spread of the COVID-19 virus through the use of cluster analysis statistical modeling on the countries COVID data collected from the Our World Data website. The dataset used in this assignment has 30 variables related to COVID-19 cases for 208 different countries. The data was collected from the start of the pandemic to $02^{nd}/09/2020$. The clustering analysis was done using R Programming Language and cluster statistical learning algorithms of Hierarchical clustering, Kmeans, and Partitioning Around Mediods(PAM). A model of six(6) clusters was built and silhouette plots were used to assess the quality of the clustering. And the hierarchical clustering model produced the highest average silhouette width of $\color{red}{\text{0.85}}$. And since different countries on the same continent have been affected differently by the virus, therefore in this regard clustering models couldn't group countries regionally. Countries were clustered depending on how they have been hit by the coronavirus pandemic.
mirugwe1/Data_Protection
The repository contains code for AES AND RSA data encryption algorithm
mirugwe1/Deep-Learning-Using-R-Keras
This assignment involves the use of Keras Tensorflow based R package to build multiple models on regression and classification data.
mirugwe1/Dimensional-Reduction-PCA-Isomap-Multi-dimensional-Scaling-and-KNN-modelling.
The goal of this project is to apply different dimensional reduction methods i.e. Principal Component Analysis (PCA), metric Multidimensional Scaling (MDS), and IsoMap to the MNIST handwritten digits data sets consisting of a greyscale image of digit 5 or 8 represented by one dimension vector of size 785 columns and Wisconsin Diagnostic Breast Cancer dataset-WDBC (source: UCI Machine Learning) consists of 569 data points classified as either malignant or benign to determine which methods and parameters work best on different types of data. We used the KNN algorithm to evaluate the performance of these dimensional reduction methods. KNN models were built both on the original dimension data sets and the dimensionally reduced data to classify digits in the MNIST data or patient's cancer status in the WDBC data. And the difference in the results was used to evaluate the impact of reducing the dimensions on accuracy. Reducing the dimensions of the MNIST handwritten digits data set, slightly improved the performance of the model's classification rate as it increased by only **0.4** i.e. from **98.5%** to **98.9%** for the IsoMap reduction method. PCA and metric MDS did not improve the performance as it reduced from **98.5%** to **96.75** for both methods. For the breast cancer data set, the model's performance only improved when PCA dimensionally reduced was considered. The model **100%** classified the patient's breast cancer status. Other reduction methods did not increase or reduce the classification accuracy from ***92.04%*** which was obtained with original data.
mirugwe1/Kaggle-s--World-Happiness-Report-Analysis
The datasets that I have chosen are happiness 2015-2019 datasets, of Kaggle’s dataset. These datasets give the happiness rank and happiness score of 155 countries around the world based on seven factors including family, life expectancy, economy, generosity, trust in government, freedom, and dystopia residual. Sum of the value of these seven factors gives us the happiness score and the higher the happiness score, the lower the happiness rank. So, it is evident that the higher value of each of these seven factors means the level of happiness is higher. We can define the meaning of these factors as the extent to which these factors lead to happiness. Dystopia is the opposite of utopia and has the lowest happiness level. Dystopia will be considered as a reference for other countries to show how far they are from being the poorest country regarding happiness level.
mirugwe1/LLM_Chatbot
Health based Chatbot powered by LLM
mirugwe1/Machine-Learning-WebApp
AutoML Web App for predicting tips using Python, Pandas Profiliing, Streamlit, and PyCaret
mirugwe1/mirugwe
mirugwe1/mirugwe1
mirugwe1/Restaurant-Tipping-Linear-Regression-Model
The goal of this project is to build a linear model for predicting the average amount of tip in dollars a waiter is expected to earn from the restaurant given the predictor variables i.e. total bill paid, day, the gender of the customer (sex) time of the party, smoker, and size of the party. And this was achieved through the use of the Linear Regression method. The dataset of 200 observations and 7 variables was divided into training and testing sets in a ratio of 8:2 respectively. The model was fitted using the lm() function of R on the train set and tested on the testing set using predict() function. And the model fitness was deeply analyzed to understand how well it fits the data. Using Lasso regularization approach, the model was improved and this helped to identify the most important predictors in estimating the amount of tip received by the waiter. And also an interaction of size and smoker was included in the final model which greatly improved its data fitness.
mirugwe1/Sentiment_Analysis
This GitHub repository for a project that aims to extract sentiments from social media data pertaining to the Ugandan Ebola outbreak using advanced deep learning techniques.
mirugwe1/TB_AI_screening
# TB Detection using Convolutional Neural Networks (CNN)
mirugwe1/ugandaemr-usermanual
User Manual for UgandaEMR
mirugwe1/ugandaemr_chatbot