spring-guides/gs-spring-boot-docker

Who does regularly clean up /tmp in a long running container?

rreitmann opened this issue · 6 comments

A spring boot web application uses /tmp (in a linux container) for writing temporary files. No matter if this is mounted from the host or not (see #66), there is a maximum capacity for this directory. Therefore a long running container (which is not restarted) will fail to create files after a while if no one cleans that directory up on a regular base.

Are there any ideas/best practices on how to solve this?

dsyer commented

If it's mounted on the host, then the host can clean it up I guess? What is the maximum capacity? It should only be limited by the size of the physical disk it is on, right? Do you have any concrete data / samples that actually display this problem? I can't think why a Spring Boot app, in general, would be writing a lot of data to /tmp. I think it might be used by Tomcat in a small way, but nothing that would grow continuously AFAIK.

I was just wondering because we use multipart file uploads and AFAIK tomcat or the servlet engine used by spring boot stores these files in the temp directory. Since we are running the spring boot containers on a managed platform (previously Pivotal Web Service, currently AWS Fargate) I have neither influence on the physical size of that directory nor can I IMHO rely on the host cleaning this directory up.

dsyer commented

I doubt if it's a very common issue - we've been running Spring Boot in containers in CloudFoundry for literally years and no-one brought this up before. Maybe long-running apps with file uploads is an uncommon use case. The best I can say is - you probably need to restart the container to be sure (not the app, the container). I know some OSes have services that clean up tmp dirs (Fedora for sure), so maybe those would be something you could add to your container.

@rreitmann To be clear, multipart uploads are piped to the application's running directory, not /tmp. (Unless you explicitly store them in /tmp).

Being inside a container implies you're in a cloud native world, which means if you're managing local files, you really need to own that task.

Given that Tomcat explicitly needs a /tmp folder to run, and since you don't appear to leveraging a cloud friendly mechanism like S3, I'd pick some subfolder to host all your files. And implement some form of "reaping" mechanism up front, e.g. any file more than 48 hours old is deleted or archived elsewhere. Something.

You cannot depend upon Tomcat starting/stopping to handle it, nor the container itself. You could run into unpredicted optimizations. So periodically clean things out.

BTW, if you run 2+ instances of your application, you now have ANOTHER problem.

If this is starting to sound complex, welcome to the problem of storing your own files in the cloud. And the reason S3 (or GridFS) has become very popular.

@gregturn Thanks for the clarification regarding multipart uploads. We do use MongoDB's GridFS for almost all files we create and we clean up GridFS ourselves. However there is one case in which we store an uploaded file by transferring it to a temp file created with Files.createTempFile for further processing. I will refactor this part.

However I still think that spring boot or tomcat should clean up the files they create in tmp when running in a container.

dsyer commented

AFAIK Spring Boot does not create files in /tmp. Tomcat only does if you use certain features (JSP maybe?). And we don’t expect those to be especially leaky. If you want to pursue that though, please open an issue in Spring Boot.