blu3r4y/jku-room-search

Store historic booking data

Opened this issue · 2 comments

Right now, the index.json is crawled once every night and re-deployed with GitHub actions:
https://github.com/blu3r4y/jku-room-search/actions/workflows/deploy-github-pages.yml

It would be valuable to also persistently store and query historic data. My first idea would be to change the action to not replace the entire history, but instead append the new file to /data on the gh-pages branch:
https://github.com/blu3r4y/jku-room-search/tree/gh-pages/data

This way, we could utilize GitHub as our storage with minimal changes to the workflow.

The index.json is roughly 300 KB in size. Storing this for 365 days would still require ~ 110 MB in uncompressed form. Still acceptable for a repository and for GitHub pages (since the limit is 1 GB) but still we should think about removing old data at some point, or moving it to some cold storage somewhere.

Actually, index.json already contains a lot of historic data and not only the data for future days. At the time of writing this, index.available has entries for all dates between 10.01.2023 and 26.09.2024. This is even stated in index.range.start and index.range.end. Tough, those dates only seam to be filled with data from the current semester. Take April 2023 for example, where no lectures are registered for the dates within the index file although there definitely were courses back then ;). Starting from September however, all historic data until now seems to be stored.

My suggestion: restrict the date range of index.json to the current semester. Then, on the first day of a new semester, we could rename it to something like index-WS2023.json and begin filling index.json with data from the new semester. This would keep the website working without changes and older historic data would then be available for future projects.