/Dados_COVID-19_PT

Portuguese COVID-19 Data - User Friendly Version

Primary LanguageR

Rendered README

Daily Portuguese COVID-19 Data

Last updated: Mon 01 Feb 2021 (16:15:31 UTC [+0000])

  • Data available from 26 Feb 2020 until 01 Feb 2021 (342 days).

Download User Friendly Version

  • Download the user friendly data from: covid19pt_DSSG_Long.csv or use the following direct link in your program:
  • Variables
    • data: Date (Portuguese spelling).
    • origVars: Variable name taken from source data.
    • origType: Orginal variable count type.
    • other: Other types of origVars.
    • symptoms: Recorded COVID-19 symptoms.
    • sex: Gender (F - Females, M - Males, All - Females & Males).
    • ageGrp: Age groups in years (desconhecidos - unknown).
    • ageGrpLower: Lower limit of age group (useful for sorting).
    • ageGrpUpper: Upper limit of age group.
    • region: Portuguese Regions
    • value: Numeric value.
    • valueUnits: Units for the variable value.

Source

For more information about the data and variables see: https://github.com/dssg-pt/covid19pt-data

The original data were downloaded from an API provide by VOST https://covid19-api.vost.pt/Requests/get_entry/

Summary: Last 10 (available) Days

Date Cases (7 Day Mean) Active Cases Deaths (7 Day Mean)
Sat 23 Jan 2021 15333 (12150.4) 162951 274 (212.1)
Sun 24 Jan 2021 11721 (12341.3) 169230 275 (229.7)
Mon 25 Jan 2021 6923 (12372.9) 170635 252 (241.9)
Tue 26 Jan 2021 10765 (12417.1) 167381 291 (252.3)
Wed 27 Jan 2021 15073 (12478.0) 172893 293 (262.9)
Thu 28 Jan 2021 16432 (12890.6) 180076 303 (274.6)
Fri 29 Jan 2021 13200 (12778.1) 181811 278 (280.9)
Sat 30 Jan 2021 12435 (12364.1) 179939 293 (283.6)
Sun 31 Jan 2021 9498 (12046.6) 181623 303 (287.6)
Mon 01 Feb 2021 5805 (11886.9) 179180 275 (290.9)

Example Usage

Read in the data

Using the data.table package to process the data.

# Load Libraries
library(data.table)
library(here)

# Read in data as a data.frame and data.table object.
CVPT <- fread(here("data", "covid19pt_DSSG_Long.csv"))
# You can use the direct link:
# CV <- fread("https://raw.githubusercontent.com/CEAUL/Dados_COVID-19_PT/master/data/covid19pt_DSSG_Long.csv")

# Looking at the key variables in the original long dataset.
CVPT[, .(data, origVars, origType, sex, ageGrp, region, value, valueUnits)]
##              data   origVars   origType sex ageGrp   region  value valueUnits
##     1: 2020-02-26     ativos     ativos All        Portugal     NA           
##     2: 2020-02-27     ativos     ativos All        Portugal     NA           
##     3: 2020-02-28     ativos     ativos All        Portugal     NA           
##     4: 2020-02-29     ativos     ativos All        Portugal     NA           
##     5: 2020-03-01     ativos     ativos All        Portugal     NA           
##    ---                                                                       
## 29750: 2021-01-28 vigilancia vigilancia All        Portugal 223150      Count
## 29751: 2021-01-29 vigilancia vigilancia All        Portugal 225507      Count
## 29752: 2021-01-30 vigilancia vigilancia All        Portugal 225365      Count
## 29753: 2021-01-31 vigilancia vigilancia All        Portugal 223991      Count
## 29754: 2021-02-01 vigilancia vigilancia All        Portugal 220353      Count

# Order data by original variable name and date.
setkeyv(CVPT, c("origVars", "data"))

# Convert data to a data object in dataset and add a change from previous day variable.
# Added a 7 day rolling average for origVars (except for symptoms). 
# Columns `data` is date in Portuguese.
CV <- CVPT[, data := as.Date(data, format = "%Y-%m-%d")][
  , dailyChange := value - shift(value, n=1, fill=NA, type="lag"), by = origVars][
    grepl("^sintomas", origVars), dailyChange := NA][
  , mean7Day := fifelse(origVars %chin% c("ativos", "confirmados", "obitos", "recuperados"), 
                         frollmean(dailyChange, 7), as.numeric(NA))]

Overall Number of Deaths (daily)

library(ggplot2)
library(magrittr)

# Change the ggplot theme.
theme_set(theme_bw())
# Data error prevents by sex plot.
# obMF <- CV[origType=="obitos" & sex %chin% c("M", "F") & ageGrp=="" & region == "Portugal"]
obAll <- CV[origType=="obitos" & sex %chin% c("All") & ageGrp=="" & region == "Portugal"][ 
  , sex := NA]

obAll %>% 
  ggplot(aes(x=data, y=dailyChange)) +
  geom_bar(stat = "identity", fill = "grey75") +
  geom_line(data = obAll, aes(x = data, y = mean7Day), group=1, colour = "brown") +
  scale_x_date(date_breaks = "1 months",
               date_labels = "%b-%y",
               limits = c(min(cvwd$data2, na.rm = TRUE), NA)) +
  theme(legend.position = "bottom") +
  labs(
    title = "COVID-19 Portugal: Number Daily Deaths with 7 Day Rolling Mean",
    x = "",
    y = "Number of Deaths",
    colour = "",
    fill = "",
    caption = paste0("Updated on: ", format(Sys.time(), "%a %d %b %Y (%H:%M:%S %Z [%z])"))
    )
## Warning: Removed 1 rows containing missing values (position_stack).
## Warning: Removed 7 row(s) containing missing values (geom_path).

Recorded Number of Confirmed COVID-19 Cases by Age Group

CV[origType=="confirmados" & !(ageGrp %chin% c("", "desconhecidos"))][
  , .(valueFM = sum(value)), .(data, ageGrp)] %>%
  ggplot(., aes(x=data, y=valueFM, colour = ageGrp)) +
  geom_line() +
  scale_x_date(date_breaks = "1 months",
               date_labels = "%b-%y",
               limits = c(min(cvwd$data2, na.rm = TRUE), NA)) +
  scale_y_continuous() +
  theme(legend.position = "bottom") +
  labs(
    title = "COVID-19 Portugal: Number of Confirmed Cases by Age Group",
    x = "",
    y = "Number of Confirmed Cases",
    caption = paste0("Updated on: ", format(Sys.time(), "%a %d %b %Y (%H:%M:%S %Z [%z])")),
    colour = "Age Group")
## Warning: Removed 54 row(s) containing missing values (geom_path).

Recorded Number of Confirmed COVID-19 Cases by Region

CV[origType=="confirmados" & ageGrp=="" & region!="Portugal"] %>%
  ggplot(., aes(x=data, y=value, colour=region)) +
  geom_line() +
  scale_x_date(date_breaks = "1 months",
               date_labels = "%b-%y",
               limits = c(min(cvwd$data2, na.rm = TRUE), NA)) +
  scale_y_log10() +
  theme(legend.position = "bottom") +
  labs(
    title = "COVID-19 Portugal: Number of Confirmed Cases by Region",
    x = "",
    y = "Number of Confirmed Cases",
    caption = paste0("Updated on: ", format(Sys.time(), "%a %d %b %Y (%H:%M:%S %Z [%z])")),
    colour = "Region")
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 326 row(s) containing missing values (geom_path).


Issues & Notes

Use and interpret with care.

The data are provided as is. Any quality issues or errors in the source data will be reflected in the user friend data.

Please create an issue to discuss any errors, issues, requests or improvements.

Calculated change between days can be negative (dailyChange).

CV[dailyChange<0 & !(origType %in% c("vigilancia", "internados"))][
  , .(data, origType, origVars, value, dailyChange)]
##            data    origType              origVars value dailyChange
##   1: 2020-05-12      ativos                ativos 23737        -249
##   2: 2020-05-16      ativos                ativos 23785        -280
##   3: 2020-05-17      ativos                ativos 23182        -603
##   4: 2020-05-18      ativos                ativos 21548       -1634
##   5: 2020-05-22      ativos                ativos 21321        -862
##  ---                                                               
## 392: 2020-10-25      obitos     obitos_arsalgarve    25         -10
## 393: 2020-05-23      obitos      obitos_arscentro   230          -3
## 394: 2020-07-03      obitos      obitos_arscentro   248          -1
## 395: 2020-06-20      obitos              obitos_f   768          -1
## 396: 2020-05-21 transmissao transmissao_importada   767          -3