The Hong Kong government routinely release COVID-19 confirmed case details in PDF format on their websites (e.g. the attachment from this link).
Some of the information were provided exclusively in these PDFs. Hereby I provide a tiny R script for downloading all these PDF attachments from different news reports over the two past years.
To facilitate downstream data wrangling with colleagues, we converted the PDF tables into Excel xlsx files using an open-source OCR software. Specifically, we tried to use the trained model with traditional Chinese (--lang=chinese_cht