Make example datasets optional
Closed this issue · 1 comments
The factgenie repository has almost 100 MBs when cloned from scratch. Majority of that are example datasets and outputs.
I realized that while the example datasets can be helpful, they can also be percieved as bloatware.
We should probably distribute them separately from the main repository, so that the default factgenie installation is as lightweight as can be.
Update: I found out that majority of the bloat was in fact caused by Factgenie.mp4
that has been once uploaded to the main repository (and remained in the history even after deletion). I fixed it using git-filter-repo:
git-filter-repo --path Factgenie.mp4 --invert-paths
The repository is now only several MBs large 💪
We tried to preserve git history as much as possible, but please let us know if this cause you some issues with your local git branches.
The main point of the issue still holds, though: we should make downloading the example datasets optional.
@kasnerz the git lfs is is a good way how to accept demo datasets. I "described" how to use it here https://github.com/ufal/factgenie/wiki/05-Developer-Notes#%EF%B8%8F-handling-large-files