ufal/factgenie

Make example datasets optional

Closed this issue · 1 comments

The factgenie repository has almost 100 MBs when cloned from scratch. Majority of that are example datasets and outputs.

I realized that while the example datasets can be helpful, they can also be percieved as bloatware.

We should probably distribute them separately from the main repository, so that the default factgenie installation is as lightweight as can be.


Update: I found out that majority of the bloat was in fact caused by Factgenie.mp4 that has been once uploaded to the main repository (and remained in the history even after deletion). I fixed it using git-filter-repo:

git-filter-repo --path Factgenie.mp4 --invert-paths

The repository is now only several MBs large 💪

We tried to preserve git history as much as possible, but please let us know if this cause you some issues with your local git branches.

The main point of the issue still holds, though: we should make downloading the example datasets optional.

@kasnerz the git lfs is is a good way how to accept demo datasets. I "described" how to use it here https://github.com/ufal/factgenie/wiki/05-Developer-Notes#%EF%B8%8F-handling-large-files