Utilities to download & convert PDF reports from the Ministry of Educations Graduate Employment Survey (GES) into .csv
for whatever downstream use case you may have. The data is subject to Singapore Open Data License.
I wrote this because data.gov.sg has not been updating this dataset since 2021.
Install dependencies with poetry
.
poetry i
Download the PDF reports:
# displayed are the default values if args are not supplied
poetry run python download_reports.py --year 2022 2023 --institution nus ntu smu sutd suss smu
This will download the relevant PDFs into in/
Run the conversion:
# displayed are the default values if args not supplied
poetry run python table_parser.py --filename tables.csv
This will output the merged tables to out/tables.csv
Examples on how to use programatically included in derive_full_dataset.ipynb
.
Install these dependencies:
pymupdf
pandas
requests
And run the above lines excluding poetry run
.