umscraper
UM Scraper scrapes all staffs names, ptj (faculty), jab (department), details and images
umscraper is a small GoLang web scraper package to scrape all UM's staffs info.
Exported variables and functions implemented till now :
func ScrapePtjJabTable(time.Duration) [][]string {} // Concurrently scrapes full list of (Ptj, Jab) pairs. Returns the table with headers [PtjCode, PtjText, JabCode, JabText].
func ScrapeStaffTable([][]string time.Duration) [][]string {} // Concurrently scrapes all staff's profiles. Return a table with headers [PtjCode, PtjText, JabCode, JabText, Name, NameQueryEsc, Details...]
func WriteCsv([][]string, string) {} // Writes a table of [][]string into specified filename.
func ReadCsv(string) [][]string {} // Reads a table of [][]string from specified filename.
Dataset
The scraped data are in this repository desmondyeoh/umscraper-data
ptjJabTable.csv
has the full list of (Ptj, Jab) pairs with columns {ptjCode, ptjText, jabCode, jabText}staffTable.csv
has the full staff list with columns {ptjCode, ptjText, jabCode, jabText, name, nameQueryEsc, details...}img/<ptjCode>/<nameQueryEsc>.jpg
can be used to find the staff's respective image file.
img/
has all the staff images files, organised into ptjCode folders.
Installation
Install the package using the command
go get github.com/desmondyeoh/umscraper
Usage
Check out the examples/
folder for a quick getting started example!
Contributions
This package was developed in my free time. However, contributions from everybody in the community are welcome, to make it a better web scraper. If you think there should be a particular feature or function included in the package, feel free to open up a new issue or pull request.