ArchiveTeam/warrior-dockerfile

Ability to choose workdir for temporary storage

raybooysen opened this issue · 5 comments

The Archive Warrior running writes gigabytes of data before uploading. I would love a way to specify via env variable the location of the temporary storage to a file system or storage device that I'd prefer

Assuming you're using docker, you can use its -v parameter to bridge the current workdir to wherever you want on the host system.

I've been trying with some variants. /home/warrior, /home/warrior/data, /home/warrior/projects. All cause various amounts of errors.

Unsure which I should be using

I've been considering this issue as well. It seems that the working directories /home/warrior/projects and /home/warrior/data are being utilized in a manner that prevents exposing them as volumes, due to the absence of certain necessary files.

For example, within /home/warrior/data, the binaries wget-at and wget-at-gnutls should be present:

&& ln -fs /usr/local/bin/wget-lua /home/warrior/data/wget-at
COPY --from=atdr.meo.ws/archiveteam/grab-base:gnutls /usr/local/bin/wget-lua /home/warrior/data/wget-at-gnutls

I've discovered a potential workaround for this issue by altering Docker's data-root setting. However, this approach is not ideal and could lead to other complications.

Properly managing working directories and enabling the exposure of volumes would also facilitate the use of tmpfs mounts. This is particularly beneficial for users with ample RAM available who wish to conserve some SSD IOPS.

This was my primary usecase. My warriors run for long periods on machines with spare RAM, a tmpfs is a good use case here so avoid the SSD completely for temporal data.

I'm using Podman, you can mount /home/warrior/projects and /home/warrior/data/projects fine and that seems to cover the files you want for persistence and all the big temp files from what I can tell with my tests.