instabaines
A Data Science and Machine Learning Researcher. I am interested in scientific computing, data modeling, and Artificial intelligence.
University of Arkansas at Little RockLittle Rock Arkansas
Pinned Repositories
BART_for_text_summarisation
Use of Facebook BART for text summarization
COVID_bayesian
Modeling the spread of the disease in Senegal using pymc3
GoEmotions
Multilabel Text Sequence Classification
Hypothesis_testing
A/B Hypothesis Testing: Ad campaign performance
Novel-Corona
Context From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people. So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community. Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here. Edited: Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community. Content 2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number. The data is available from 22 Jan, 2020. Column Description Main file in this dataset is covid_19_data.csv and the detailed descriptions are below. covid_19_data.csv Sno - Serial number ObservationDate - Date of the observation in MM/DD/YYYY Province/State - Province or state of the observation (Could be empty when missing) Country/Region - Country of observation Last Update - Time in UTC at which the row is updated for the given province or country. (Not standardised and so please clean before using it) Confirmed - Cumulative number of confirmed cases till that date Deaths - Cumulative number of of deaths till that date Recovered - Cumulative number of recovered cases till that date 2019_ncov_data.csv This is older file and is not being updated now. Please use the covid_19_data.csv file Added two new files with individual level information COVID_open_line_list_data.csv This file is obtained from this link COVID19_line_list_data.csv This files is obtained from this link Country level datasets If you are interested in knowing country level data, please refer to the following Kaggle datasets: India - https://www.kaggle.com/sudalairajkumar/covid19-in-india South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases Acknowledgements Johns Hopkins University for making the data available for educational and academic research purposes MoBS lab - https://www.mobs-lab.org/2019ncov.html World Health Organization (WHO): https://www.who.int/ DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia. BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/ National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html Macau Government: https://www.ssm.gov.mo/portal/ Taiwan CDC: https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0 US CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.html Government of Canada: https://www.canada.ca/en/public-health/services/diseases/coronavirus.html Australia Government Department of Health: https://www.health.gov.au/news/coronavirus-update-at-a-glance European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases Ministry of Health Singapore (MOH): https://www.moh.gov.sg/covid-19 Italy Ministry of Health: http://www.salute.gov.it/nuovocoronavirus Picture courtesy : Johns Hopkins University dashboard Inspiration Some insights could be Changes in number of affected cases over time Change in cases over time at country level Latest number of affected cases
PHI-Extraction
NER approaches for extracting Protected Health Information from Medical Records (PHI)
Plant-Disease-Classifier
A computer vision based approach to detection of disease in plants. The model is deployed in an android device to be used on the field
rossaman-store-end-to-end-ml-prediction
An end to end machine learning prediction for rossamann store problem
Sexual_predator
Codes for PAN12 Deception Detection: Sexual Predator Identification task
Text-Classfication
Applying Natural Language Processing Techniques to classify text from various sources
instabaines's Repositories
instabaines/BART_for_text_summarisation
Use of Facebook BART for text summarization
instabaines/Novel-Corona
Context From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people. So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community. Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here. Edited: Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community. Content 2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number. The data is available from 22 Jan, 2020. Column Description Main file in this dataset is covid_19_data.csv and the detailed descriptions are below. covid_19_data.csv Sno - Serial number ObservationDate - Date of the observation in MM/DD/YYYY Province/State - Province or state of the observation (Could be empty when missing) Country/Region - Country of observation Last Update - Time in UTC at which the row is updated for the given province or country. (Not standardised and so please clean before using it) Confirmed - Cumulative number of confirmed cases till that date Deaths - Cumulative number of of deaths till that date Recovered - Cumulative number of recovered cases till that date 2019_ncov_data.csv This is older file and is not being updated now. Please use the covid_19_data.csv file Added two new files with individual level information COVID_open_line_list_data.csv This file is obtained from this link COVID19_line_list_data.csv This files is obtained from this link Country level datasets If you are interested in knowing country level data, please refer to the following Kaggle datasets: India - https://www.kaggle.com/sudalairajkumar/covid19-in-india South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases Acknowledgements Johns Hopkins University for making the data available for educational and academic research purposes MoBS lab - https://www.mobs-lab.org/2019ncov.html World Health Organization (WHO): https://www.who.int/ DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia. BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/ National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html Macau Government: https://www.ssm.gov.mo/portal/ Taiwan CDC: https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0 US CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.html Government of Canada: https://www.canada.ca/en/public-health/services/diseases/coronavirus.html Australia Government Department of Health: https://www.health.gov.au/news/coronavirus-update-at-a-glance European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases Ministry of Health Singapore (MOH): https://www.moh.gov.sg/covid-19 Italy Ministry of Health: http://www.salute.gov.it/nuovocoronavirus Picture courtesy : Johns Hopkins University dashboard Inspiration Some insights could be Changes in number of affected cases over time Change in cases over time at country level Latest number of affected cases
instabaines/COVID_bayesian
Modeling the spread of the disease in Senegal using pymc3
instabaines/Hypothesis_testing
A/B Hypothesis Testing: Ad campaign performance
instabaines/Seismic_inversion_with_segyio
A demonstration of python tools for seismic data processing and inversion
instabaines/GoEmotions
Multilabel Text Sequence Classification
instabaines/MLOps
collection of codes for MLOps training
instabaines/PHI-Extraction
NER approaches for extracting Protected Health Information from Medical Records (PHI)
instabaines/Plant-Disease-Classifier
A computer vision based approach to detection of disease in plants. The model is deployed in an android device to be used on the field
instabaines/rossaman-store-end-to-end-ml-prediction
An end to end machine learning prediction for rossamann store problem
instabaines/Sentiment-analysis
Collection of Notebooks and codes containing sentiment analysis of Twitter data
instabaines/Sexual_predator
Codes for PAN12 Deception Detection: Sexual Predator Identification task
instabaines/Text-Classfication
Applying Natural Language Processing Techniques to classify text from various sources
instabaines/African_Influencers
instabaines/Anime-Recommendation
Developing a recommendation system for Anime fans using Pyspark
instabaines/chat-with-document
RAG service for extracting information from documents in an interactive session
instabaines/Clinical-Longformer
instabaines/Collection_Of_R_Projects
Collection of various projects I did using R
instabaines/Data-Engineering
A collection of tutorials and projects in data engineering
instabaines/Fashion-items-price-prediction
instabaines/instabaines
instabaines/instabaines.github.io
Data Science Portfolio
instabaines/markdown-badges
Badges for your personal developer branding, profile, and projects.
instabaines/MinkowskiEngine
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
instabaines/node2vec
Implementation of the node2vec algorithm.
instabaines/Point-SAM
Point-SAM: This is the official repository of "Point-SAM: Promptable 3D Segmentation Model for Point Clouds". We provide codes for running our demo and links to download checkpoints.
instabaines/seismic_computer_vision
Application of computer vision technology to seismic facie classification
instabaines/University_Database
Designing a Database for a School System
instabaines/User_Analytics_Telecommunication_Industry
Analysis of user data in the telecommunication industry for business decision making
instabaines/Visualizing-activation-layers
CNN for Plant disease classification with methods of visualizing convnet layers