Web Parser for lontar.cs.ui.ac.id
For easy viewing bachelor thesis list (aka. Skripsi in Bahasa Indonesia).
Why I Am Doing This?
Because..
- lontar doesn't show full list of skripsi. Instead it use pagination.
- It has search, but i can't categorize skripsi by year.
- It doesn't have API
So i make a web parser. It parse the web page and convert it into human readable and filterable output.
Result Table / Output
We have 4 output formats. Choose that suit for you:
- Markdown. This format is suitable for human viewing and very readable. I have sorted it and have categorized it in by year.
- TSV. This is actually a CSV-like format, but use tab (
\t
) as separator. This is done due to some title which use comma. The benefit is instant search (when viewed from Github Desktop Web). You can also download this format and open it with your favorite spreadsheet program (like Libreoffice Calc, Number, or Excel) - JSON. If you want your own data, just grab this format.
Updating and Installation
For some reason in the future, i might forget to update the output. To update, simply:
-
Install Ruby
-
Clone this repositories, and
cd
into it.git clone https://github.com/mufid/lontarcs-parser # Do you know you can omit .git in Github remote? cd lontarcs-parser
-
Install the dependencies
bundle install
-
Run the ruby script
ruby lontar-cs-sc.rb
-
See the result in
out.*
files.