/BDACovid19

Primary LanguageJupyter Notebook

Covid Data Analysis And Real Time monitoring

Abstract

As the world struggles with the pandemic, It is of utmost importance that we keep people well informed about the status of the pandemic and find patterns in the data to gain new insights and look for solutions. This Project aims to do both, Real Time Monitoring of the Pandemic and Analysis of the data collected so far from various popular datasets. We will also be building models using advanced ML algorithms for various use cases like time series forecasting and predicting the number of confirmed cases. Apache Spark along with its libraries and Kafka will be the key technologies used.


Data Analysis and ML Modelling
For the data analysis and ML modelling, we’ll be using the postman API for data collection and use Kafka to read from these api’s , process, divide and publish them to multiple topics. We’ll use spark sql for structuring the datasets and then use the pandas library for creating data frames for Modelling. The Sklearn library will be used to train our ML linear prediction model. We’ll be visualizing the predictions using the Matplotlib and seaborn library. Finally the whole application will be created as a Flask App using the Docker image of the model which will then be deployed on a Kubernetes cluster.


Data Analysis and ML Modelling System Architecture

Screenshot 2021-04-18 at 8 37 43 PM

Web Crawling, Indexing and Queries using Apache Nutch and Apache Solr

3RVif https://drive.google.com/drive/folders/18ivGLij26JitfxB05EcVqExK-QmhnqvO?usp=sharing