In this project, I performed data exploration using SQL to analyse global data regarding confirmed COVID-19 fatalities. This included exploring total cases, new cases, total deaths and new deaths. It's worth mentioning that the count of confirmed deaths may not provide a precise reflection of the actual number of COVID-19-related deaths. During my analysis, I was particularly interested in exploring data related to the United Kingdom, which offered insights into the mortality rates within my country of residence.
Source: https://www.worldometers.info/coronavirus/country/uk/
- I first reviewed the total cases vs the total deaths for the United Kingdom, as well as the death percentage. As of 16 March 2023, there were 24,423,396 total cases, 208,458 deaths and a 0.8% likelihood of dying if someone was to contract covid as at that date.
- Next I reviewed the total covid-19 cases vs the population in the United Kingdom. This revealed the percentage of the United Kingdom population that has been reported to have contracted covid-19. As at 16 March 2023, this was around 36%. The population at that date was 67,508,936 (67.50 million).
- I then reviewed the countries with the highest infection rate compared to population. As of the date of my data exploration, Cyprus was reported to have the highest infection rate compared to their population at 72.8%.
Source: https://www.worldometers.info/coronavirus/country/cyprus/
- Next, I looked at the countries with the highest death count per population. On 16 March 2023, the United States had the highest death count at 1,113,229 followed by Brazil (699,310) and India (530,789). The United Kingdom was number 7 with a death count at 208,458.
- I wanted to further explore what the figures were by each continent to explore which continents had the highest death counts per population. As I verified the figures from the data, I noticed that the result for North America was incorrect as it only included the figures for the United States. In order to correct this, I ran the query again to show the output by location (also excluding the income categories) which gave a more accurate result of the death counts by continent. Out of all the continents, Europe had the highest total deaths count.
- I then looked at the global figures of total cases, total deaths, and total death percentage where the continent is not null. This was for the purpose of data visualisations that would be created following data exploration.
-
I then used Joins to join both tables in order to look at the total population vs Vaccinations. The join was on location and date. This showed the percentage of the population in the world that has taken the covid-19 vaccination. I then partitioned by the location. This was so that the count will start over for each country/location. I then ordered the results by location and date. As of 16th March 2023, the number of covid-19 vaccinations taken in the United Kingdom was 148,570,849. According to the data, the first set of vaccinations were recorded on 11 January 2021.
-
I next used a CTE (Common Table Expression) to perform a calculation on the partition by from the previous query. I then added a percentage column to see what percentage of each location has been vaccinated.
-
Alternatively, I did a temp table which did the exact same thing.
-
I then created a view to store data for later visualisation on Tableau.
- Joins
- CTE's
- Temp Tables
- Windows Functions
- Aggregate Functions
- Creating Views
- Converting Data Types
Source: https://ourworldindata.org/covid-deaths
N.B. The data on the coronavirus pandemic is updated daily.