IBM/plex

Source files too large to upload to github

Closed this issue · 13 comments

Trying to cut the release today, ran into the following

image

This is similar to #453 but different - we're now hitting the filesize limit on this repo with the actual source files. Not related to yarn cache or anything else, this is just plainly too large.

There's two options to resolve:

  1. Turn on and use git-lfs
  2. No longer include source files in the repository, instead include them as release artifacts

I think the second option makes the most sense. It will be some additional manual steps for now during the release process but in the future we could automate it.

There is a third option I think: to convert the .glyphs files to .glyphspackage. AFAIK this will split the file into lots of smaller ones, similar to .ufo format.

If that works (I need to check) that would be my preferred solution.

@tay1orjones Just checked and my assumption about .glyphspackage files is correct.
Let me create a new package with all files and I’ll let you know when it’s done.

@tay1orjones Done! I’ve replaced the older package.

Thanks @BoldMonday - I've actually ran into an even larger problem to reopen this issue. The overall package size is now too large to publish to npm

image

The tentative plan is to get the existing updates that have been queueing since May into @ibm/plex and we'll release traditional chinese into it's own package at @ibm/plex-traditional-chinese

From there, the new maintainers will need to work on converting the rest of the repository into supporting publishing packages per-family instead of into one big @ibm/plex, #453

Maybe it'll be a good idea to split JP/KR/TC/SC into a CJK repo similar to Source Han/Noto Sans CJK?

Yeah the plan is to keep it all in one repo but the families will split up into individual packages on npm. Full details in #452, the consensus is to go with "Option 1" outlined there.

I've got a PR up addressing the first piece of this - removing the source files from the repo. I opened a new issue for the npm publish blocker for Plex TC: #561

@tay1orjones I think it's a pity that source files are not stored in the repo anymore.
Is converting all .glyphs files to .glyphspackage not a viable solution?

@BoldMonday I agree, unfortunately it's unavoidable as Plex grows. Yes, the .glyphspackage fix technically unblocks the issue I screenshotted at the opening of this issue, although the overall size of the repo is still a big concern. The repo size today is ~500mb. Adding the TC folder alone will be another 1.1GB.

GitHub recommends repo size stay below 1GB.

The source files will still be included in the repo under the releases tab as artifacts on each release moving forward. So they'll be there in the release history, just not fully version controlled anymore. The source folder has never been published to npm either, so this should be a pretty small impact of a change.

Any chances the source files for Chinese/CJK can still be tracked in a separate branch per language? Maybe as source-tc, source-jp, source-kr (and upcoming source-sc)? This would still keep the repo roughly under 1GB (per branch, that is), but at least it's still under version control. This could also be done for the webfonts.

Source Han has two branches under separate version control, essentially keeping two separate timelines (source files and release fonts) in one repo. The only downside is releases and tags will need to be made twice, and also the git history of the whole repo will keep increasing in size (Source Han repo git history has reached 10GB after ~10 years and 8 major version release).

@NightFurySL2001 My understanding is that a repository includes all branches stored on the remote, as well as the full repo history. Having them on separate branches wouldn't reduce the overall repo size to my knowledge.

What's the benefit to keeping the sources version controlled? They're currently not updated any more frequently than when a release is published, and the files can't be diffed within github. Is there a tool that supports diffing two versions stored in a git repository? I'm not sure what the added benefit or use case might be for storing them in version control.

At least for glyphs, .glyphspackage and .ufo files, cloning the repo and doing a standard git diff with git diff ver1 ver2 can show which glyphs, features or metrics are changed between versions, since these 3 formats are text based. .glyphspackage and .ufo are slightly more easier to work with because they store each glyph in separate files, so files that differ between versions (using git diff --name-only) are glyphs/metrics that are modified between versions. .glyphs however is a full list of glyphs in one single file so will require to check the lines individually to see which corresponding glyphs are changed.

It's similar to the Plex release log on what has changed, but since CJK has plenty glyphs it'll be more efficient to just use git diff. (See #556 for a list of errors and suggestions for TC, which is pretty long) Also, releases are a GitHub specific feature and is not included in the git repository, so a standard git clone can't get the source files in releases, instead requiring to do cURL call to GitHub server separately. I digress though since CJK font files are inherently big and plenty of projects do provide alternate means to get the font source files instead of tracking it directly in git.