Repository Size Limit
jonasfranz opened this issue Β· 49 comments
It would be nice if it possible to set a global or user repository size limit. This would be good for Gitea providers providing Gitea for the public with limited disk space to have this option as mentioned in #1029 .
Approaches for restricting repo size
Solution 1
- User gets a notification if a repository exceeded the size limit
- Admin gets a notification if a repository exceeded the size limit (optional)
- Repository gets deleted (after a ultimatum) if a user did not reduced the size of the repository
Solution 2
- The size of every push gets approved (I don't know if that is possible or realistic) and blocked if the push is to large
Suggestion: Have a warning limit too. That way users are warned when they are, say, 80% full - so they've got some time to do something about it.
I'd prefer the second solution + warning. It doesn't seem practical to completely delete a users repo if it is past the size limit.
@kolaente Do you have an idea who to calculate the size of a push before it is finished?
@JonasFranzDEV seems possible with a pre-receive hook that check size (cat-files) of the commit object. For examples : https://github.com/github/platform-samples/tree/master/pre-receive-hooks & https://stackoverflow.com/questions/40697663/show-commit-size-in-git-log
@sapk Thank you for the hint. It seems that size, err := git.GetRepoSize(repoPath) will return the size of the repository including the newest push if it is called at pre-receive.
How should the size limit work?
- Global size limit via the config file for all repositories
- Custom size limit for repositories and a default value from the config file
a. How can change the limit of a repository and how? I propose the admin but I do not know if the admin panel or the repository settings are the right place to do this? - Size limit per user via config file with the option the change the size limit via the admin panel like it is possible for the number of repositories. (user administration)
Other proposals are welcome too!
We could have a combination of the above, a global size limit to make sure that the disk doesn't run out of space, and a per user limit (in the case of some git hosts they say each user gets 1GB or something).
@techknowlogick Should a push (no matter which user) be restricted/denied, if the global size limit is exceeded?
I don't think 1st one is needed but would be great if it was combination of 2nd and 3rd. With defaults in config and custom values stored in repo and user tables with 0 being that default value limit should be applied and -1 that there is no limit
Oh one more thing - should LFS be counted and if so than that should probably be done seperetly
How can change the limit of a repository and how? I propose the admin but I do not know if the admin panel or the repository settings are the right place to do this?
@JonasFranzDEV for user limit in user editing in admin panel. For repository there is now also a admin specific option in repo settings already
Also what happen when a user try to push to an org ? or did we consider org like a distinct user ?
I would say that org should be treated as separate user
I don't think we should rely on just 2 & 3 as perhaps if you don't want to limit the number of users but you want to make sure you don't run out of space, then a global limit would be needed (ex. try.gitea.io runs out of space and if a user maxes out their space you don't want to encourage them to create an additional account). This is especially possible with an open gitea instance.
Regarding what we should do when the limit is reached, there are two options that I see:
- show warnings in the git push prompt, and perhaps raise the warning is system messages of admin panel
- reject push
or have two limits: soft (1), and hard (2)
We should think of a way to prevent the following:
A user creates a repo, which means he is now admin of said repo. As a repo admin he'd be able to change the repo size limit and bypass the global limit set by the server admin...
I guess the simplest solution would be to only allow to change the size limit setting if the user is also a server admin.
@kolaente there is already a section in repo Settings that is available for server admin
@techknowlogick I do not agree with that because I think that this would deny every push to gitea at all if the global limit is exceeded. So we need a way to restrict the size per user because this would not restrict users having only small repos from pushing.
@lafriks Would this be a posibility to get unlimited storage by creating organizations? My ideas:
a) Bind the limit of an organization to the owner of an organization
b) Accept @lafriks proposal and limit the count of organizations per user
c) Bind the limit of an organization to the members of the organization (might be complicated for users and developers due to the relationship between members and maximum size). Example: Organization uses 10GB of storage every user has 5GB of storage, The org has 4 members. => Every member could only use 2.5GB for his personal account because 2.5GB is used by the organization. (2.5GB*4=10GB)
@JonasFranzDEV
c) You mean when pushing to a repo inside that org or in general (aka when creating normal repos under the normal account)?
I'd go with @lafriks here, I think it would save us a lot of headache.
Another thing: we should treat migrations like normal repos (in terms of limits), right? This would mean updating a migration should fail if a user has exceeded his limit. And we could check if a user has enough space left when creating the migration instead of doing that later on.
I think there are two different concepts here. One is repository size limit, another is user upload size limit. A repository size is the repository's folder size. User upload size is the sum size of all the commits he uploaded.
@lunny it's quite easy to get the size of a repository after a push but I think that it would be harder to get the size of a commit itself. So I would propose to use repository size. The idea was to limit the size of a repository by a user limit.
Any intermediate solution until this is implemented? Is there maybe an option to limit Gitea's overall disk use (without getting into complex drive partitions etc.)?
This is IMHO the main feature that is holding back Gitea's use for any kind of semi-public service :(
@poVoq as an interim solution consider suggesting users run the BFG repo cleaner to remove large files (especially video or large binaries) from their project history.
TBH disk space is cheap these days. I understand the word "cheap" is subjective but a little ingunity can go a long way. Let us not be held back by our tools but by our imaginations.
Is it true that any user with write permission to repository could disrupt the service for everyone by pushing quite big data to his personal repository?
In theory yes, @yatsyk - as far as I understand.
Unless you're running a public instance (i.e. allowing anyone to sign up and create an account) though, it's unlikely to happen - and especially not on purpose.
@yatsyk If you're not running a public instance and an instance for you and some friends / co-workers, I would assume that you'd trust them enough not to intentionally try and break the server. If it does happen by accident, as others have said the BFG repo cleaner can help sort out the mess.
Of course, you will have unique requirements for your particular use-case.
@sbrl we should validate data in any service is it public or not. Co-worker computer could be hacked and we should not compromise other users.
@sapk what will happen with mirrors? Will they be handled exactly the same?
@alexanderadam I think my PR is ignoring this case. I will need to check if the pre-receive hook is triggered by mirror.
Is there any workaround for this right now? I'm getting the warning:
error: RPC failed; HTTP 413 curl 22 The requested URL returned error: 413 Request Entity Too Large
Not being able to set a limit allowed a (presumably unintended) denial-of-service on my public instance.
any updates? otherwise it is not really possible to host it publicly
I really need this feature!
My server is getting too busy with very very large git repos! Like 32GB of repos.
I would like to keep my server open for users, but 0,1% of the users screws-up for the rest of the users.
We are implementing this ourselves as we cannot wait. If anyone would like to collaborate, pls lmk
We are implementing this ourselves as we cannot wait. If anyone would like to collaborate, pls lmk
Did you see this PR #7833 ?
Thanks for pointing out, we did see and review that PR. However, since the PR code appears to never have been finished and also was made against a different version than what we are running, we felt we had to choose between "invest an uncertain amount of time/resources to determine whether that code would be worth attempting to reuse, with the best-case outcome still requiring further work to integrate into our customized-- and yet not latest-- gitea codebase" vs "invest a predictable amount of time/resources to implement this relatively small feature ourselves"-- and we went with the latter.
I would certainly have preferred to collaborate and spend those several-thousand bucks on something else, and furthermore I would have been happy to share the work we have done on this feature-- if only that could be done modular fashion that did not require a lot of additional work and cost for us. Unfortunately, from our perspective, those last two conditions are not met (I think this speaks to the enormous value, that is currently being lost, of having plug-in support as mentioned in #16195). Perhaps that is the fault of our own poor engineering decisions, but whatever the reason may be, our experience, real or perceived, is that while we would prefer to collaborate (both as a recipient of and donor of code for common features such as this) and we have budget to contribute, we are finding it difficult to do so in a way that makes economic sense.
Given now how increasingly far apart our gitea code bases are, I am increasingly of the view that the plug-in framework is a necessary precondition to enabling collaboration on other features.
I know gitea is made to be self hosted but many don't have the knowledge to do it, why not sharing gitea? For that we need quotas limits, for the number of users and the volume of data they can use on disk. That would be a nice functionnality.
@cGIfl300 there's codeberg and gitea.com
This is not what I mean, I mean each one could open an instance and allow a limited number of account, using a central system is not what I like in self hosting.
@mewalig hi. Were you able to do the PR? I also have this feature in some semi-developed state and as developer left looking for a person who could continue. It even works, but I think needs a lot of fine tuning to make PRable
Hi @DmitryFrolovTri we didn't create a PR because the version of gitea we are working from is now far behind the current version, and when we looked at the PR diff it just didn't make sense. I am still of the view that in the absence of the core maintainers agreeing to incorporate this into the core codebase (assuming the contributed code is of reasonable quality and does what it's supposed to do), it's unlikely to be useful to anyone to attempt to contribute this feature until and unless there is some sort of plug-in mechanism
Well funny enough I also have a full PR for the old version that was done a while ago, it is very far behind.
@mewalig if you have the PR to look at it would be of help as well
Sure. Please advise if any ideas that we are trying to implement here are contradicting
We are moving ahead with this so hopefully we can have it done soon PR #21820
We are moving ahead with this so hopefully we can have it done soon PR #21820
What would be awesome for my public running server! And later git LFS. In my case without git LFS repo size limits, would be very welcome already.
Just chiming in with my support for this issue. I had a bot register on my Gitea instance and proceed to migrate multi-gig repositories from numerous git servers around the internet.