Dashboard Link: COVID Data - Dashboard
The COVID-19 pandemic has had profound effects globally, impacting various aspects of society including public health, economy, and social dynamics. In response to this crisis, extensive data collection and analysis have been undertaken to understand the spread of the virus, its impact on different populations, and the effectiveness of mitigation strategies. The COVID Portfolio Project aims to leverage SQL tools and techniques to explore and analyze COVID-19 data, providing insights into infection rates, mortality rates, vaccination progress, and other relevant metrics.
- Initial Data Exploration
- Total Cases vs Total Deaths
- Total Cases vs Population in India
- Countries with Highest Infection Rate
- Countries with Highest Death Count per Population
- Continent-wise Death Count
- Global Numbers
- Total Population vs Vaccinations
- Using Common Table Expressions (CTEs)
- Using Temporary Tables
- Creating Views
SELECT *
FROM PortfolioProject..CovidDeaths
WHERE continent IS NOT NULL
ORDER BY 3,4
This query retrieves all columns from the CovidDeaths table, filtering out records where the continent is not specified. The data is sorted by the third and fourth columns.
SELECT location, date, total_cases, total_deaths, (total_deaths/total_cases)*100 AS DeathPercentage
FROM PortfolioProject..CovidDeaths
WHERE location = 'India' AND continent IS NOT NULL
ORDER BY 1, 2
This query calculates the percentage of deaths in India relative to total cases, providing insight into the mortality rate.
SELECT location, date, population, total_cases, (total_cases/population)*100 AS PercentPopulationInfected
FROM PortfolioProject..CovidDeaths
WHERE location = 'India' AND continent IS NOT NULL
ORDER BY 1, 2
This query determines the percentage of India's population infected with COVID-19 over time, based on reported cases.
SELECT location, population, MAX(total_cases) AS HighestInfectionCount, MAX((total_cases/population)*100) AS HighestPercentPopulationInfected
FROM PortfolioProject..CovidDeaths
WHERE continent IS NOT NULL
GROUP BY location, population
ORDER BY HighestPercentPopulationInfected desc
This query identifies countries with the highest infection rates relative to their population.
SELECT location, MAX(CAST(total_deaths AS int)) AS HighestTotalDeathCount
FROM PortfolioProject..CovidDeaths
WHERE continent IS NOT NULL
GROUP BY location
ORDER BY HighestTotalDeathCount desc
This query lists countries with the highest death counts per population.
SELECT continent, MAX(CAST(total_deaths AS int)) AS HighestTotalDeathCount
FROM PortfolioProject..CovidDeaths
WHERE continent IS NOT NULL
GROUP BY continent
ORDER BY HighestTotalDeathCount desc
This query displays the continents with the highest death counts.
SELECT date, SUM(new_cases) AS NewCases, SUM(CAST(new_deaths AS INT)) AS NewDeaths, SUM(CAST(new_deaths AS INT))/SUM(new_cases)*100 AS DeathPercentage
FROM PortfolioProject..CovidDeaths
WHERE continent IS NOT NULL
GROUP BY date
ORDER BY 1, 2
These queries provide various global statistics including daily new cases, deaths, and death percentage.
- The CAST() function in SQL is used to convert an expression from one data type to another. The CAST() function is applied to new_deaths, converting its data type to INT, which stands for integer. This conversion ensures that the values in the new_deaths column, which is stored in nvarchar data type is treated as integers.
- The SUM() function then aggregates the converted values of new_deaths, summing up the total number of new deaths across all records.
SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations
, SUM(CONVERT(INT, vac.new_vaccinations)) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.date) AS RollingPeopleVaccinated
FROM PortfolioProject..CovidDeaths AS dea
JOIN PortfolioProject..CovidVaccinations AS vac
ON dea.location = vac.location
AND dea.date = vac.date
WHERE dea.continent IS NOT NULL
ORDER BY 2, 3
This query analyzes the progress of COVID-19 vaccinations relative to population size. Here's a breakdown of the code:
-
SELECT statement:
- It selects specific columns from two tables,
CovidDeaths
andCovidVaccinations
. - Additionally, it calculates the rolling sum (the summation of a sequence of numbers which is updated each time a new number is added to the sequence, by adding the value of the new number to the previous running total) of new vaccinations using the
SUM()
function with theOVER
clause. This rolling sum is partitioned by location and ordered by location and date. The result is aliased asRollingPeopleVaccinated
.
- It selects specific columns from two tables,
-
FROM clause:
- It specifies the tables from which data will be retrieved:
PortfolioProject..CovidDeaths
andPortfolioProject..CovidVaccinations
. - The
dea
andvac
are aliases assigned to theCovidDeaths
andCovidVaccinations
tables, respectively.
- It specifies the tables from which data will be retrieved:
-
JOIN condition:
- It joins the
CovidDeaths
andCovidVaccinations
tables based on the location and date columns. - This ensures that the vaccination data corresponds to the same location and date as the COVID-19 data.
- It joins the
-
WHERE clause:
- Filters out records where the continent is not specified, ensuring that only relevant data is included in the analysis.
-
ORDER BY clause:
- Orders the results primarily by location (
location
) and then by date (date
) in ascending order.
- Orders the results primarily by location (
WITH PopvsVac (continent, location, date, population, new_vaccinations, RollingPeopleVaccinated)
AS
(
SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations
, SUM(CONVERT(INT, vac.new_vaccinations)) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.date) AS RollingPeopleVaccinated
FROM PortfolioProject..CovidDeaths AS dea
JOIN PortfolioProject..CovidVaccinations AS vac
ON dea.location = vac.location
AND dea.date = vac.date
WHERE dea.continent IS NOT NULL
-- ORDER BY 2, 3
)
SELECT *, (RollingPeopleVaccinated/Population)*100 AS PercentageRollingPeopleVaccinated
FROM PopvsVac
This code utilizes a Common Table Expression (CTE) named PopvsVac
to calculate the percentage of the population vaccinated. Here's a concise explanation:
-
CTE Definition (PopvsVac):
- The CTE defines columns
continent
,location
,date
,population
,new_vaccinations
, andRollingPeopleVaccinated
. - It selects data from
CovidDeaths
andCovidVaccinations
tables, joining them on the location and date. - The
RollingPeopleVaccinated
calculates the rolling sum of new vaccinations partitioned by location and ordered by location and date.
- The CTE defines columns
-
Main Query:
- The main query selects all columns from the CTE (
PopvsVac
). - It calculates the percentage of rolling people vaccinated by dividing
RollingPeopleVaccinated
byPopulation
and multiplying by 100.
- The main query selects all columns from the CTE (
This query provides a way to calculate the percentage of the population vaccinated by leveraging a CTE for intermediate calculations.
DROP TABLE IF EXISTS #PercentPopulationVaccinated
CREATE TABLE #PercentPopulationVaccinated
(
Continent nvarchar(255),
Location nvarchar(255),
Date datetime,
Population numeric,
New_vaccinations numeric,
RollingPeopleVaccinated numeric
)
INSERT INTO #PercentPopulationVaccinated
SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations
, SUM(CONVERT(BIGINT, vac.new_vaccinations)) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.date) AS RollingPeopleVaccinated
FROM PortfolioProject..CovidDeaths AS dea
JOIN PortfolioProject..CovidVaccinations AS vac
ON dea.location = vac.location
AND dea.date = vac.date
--WHERE dea.continent IS NOT NULL
--ORDER BY 2, 3
SELECT *, (RollingPeopleVaccinated/Population)*
100 AS PercentageRollingPeopleVaccinated
FROM #PercentPopulationVaccinated
This code showcases the utilization of temporary tables to store intermediate results and execute subsequent calculations. Here's a brief overview:
-
Temporary Table Creation:
- It starts by dropping the temporary table
#PercentPopulationVaccinated
if it already exists to ensure a clean slate. - Then, it creates a new temporary table
#PercentPopulationVaccinated
with specified columns:Continent
,Location
,Date
,Population
,New_vaccinations
, andRollingPeopleVaccinated
.
- It starts by dropping the temporary table
-
Data Insertion:
- Next, it inserts data into the temporary table.
- The data is selected from a query that joins data from two tables,
CovidDeaths
andCovidVaccinations
, based on location and date. - The
RollingPeopleVaccinated
column is calculated as a rolling sum of new vaccinations partitioned by location and ordered by location and date.
-
Main Query:
- After populating the temporary table, a subsequent SELECT statement is executed.
- This SELECT statement retrieves all columns from the temporary table (
#PercentPopulationVaccinated
). - Additionally, it calculates the percentage of rolling people vaccinated by dividing
RollingPeopleVaccinated
byPopulation
and multiplying by 100.
In summary, this code segment employs temporary tables to store intermediate results, facilitating the execution of further calculations or data manipulation operations on the stored data.
CREATE VIEW PercentPopulationVaccinated AS
SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations
, SUM(CONVERT(BIGINT, vac.new_vaccinations)) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.date) AS RollingPeopleVaccinated
FROM PortfolioProject..CovidDeaths AS dea
JOIN PortfolioProject..CovidVaccinations AS vac
ON dea.location = vac.location
AND dea.date = vac.date
WHERE dea.continent IS NOT NULL
-- ORDER BY 2, 3
SELECT *
FROM PercentPopulationVaccinated
CREATE VIEW HighestDeathCount AS
SELECT location, MAX(CAST(total_deaths AS int)) AS HighestTotalDeathCount
FROM PortfolioProject..CovidDeaths
WHERE continent IS NOT NULL
GROUP BY location
-- ORDER BY HighestTotalDeathCount desc
This code snippet creates two views, PercentPopulationVaccinated
and HighestDeathCount
, to store data for subsequent visualization or analysis. Here's a concise explanation:
-
Creating the View
PercentPopulationVaccinated
:- This view is created using the
CREATE VIEW
statement. - It selects specific columns (
continent
,location
,date
,population
,new_vaccinations
) from the joined tablesCovidDeaths
andCovidVaccinations
. - The
SUM(CONVERT(BIGINT, vac.new_vaccinations)) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.date) AS RollingPeopleVaccinated
calculates the rolling sum of new vaccinations partitioned by location and ordered by location and date. - The
WHERE
clause filters out records where the continent is not specified. - There is a commented-out
ORDER BY
clause which isn't functional within a view.
- This view is created using the
-
Creating the View
HighestDeathCount
:- This view is also created using the
CREATE VIEW
statement. - It selects the
location
column and calculates the maximum value oftotal_deaths
, cast as an integer, for each location. - The
WHERE
clause filters out records where the continent is not specified. - The results are grouped by location.
- Similar to the first view, there is a commented-out
ORDER BY
clause which isn't functional within a view.
- This view is also created using the
Overall, these views simplify complex queries and facilitate data reuse by storing specific subsets of data from the CovidDeaths
and CovidVaccinations
tables. They can be queried directly like tables, making it easier to analyze or visualize the data they contain.