MarceauKa/shaark

Issues with running in Docker

kamtschatka opened this issue · 1 comments

I don't expect any fixes for this, it is more of a documentation for version 2.0 and for others who might run into similar issues in the meantime.

Those are the issues I have encountered:

  • the configuration for the database needs to be written into the .env file. Usually those parameters can be used at runtime and will be set in the docker container via Environment variable. To workaround this, I have the replacement of the variables in the Dockerfile, so they need to be available at build time. Could be moved into a startup script, but changing those values on the fly doesn't seem to be easy (or I did not know which command to run to take the .env parameters again to write it into the config shaark is using)

  • the /storage folder needs to exist for anything to work. The folder(+subfolders) is committed to the repo and when you are mapping a volume to the docker container to this folder, it gets "overridden" with the mapped volume, which means it will be empty. This causes shaark to just return a 500 error and that's it. To fix it, I have temporarily moved the /storage folder out of the way so I can map the volume to /storage and then before starting the shaark server, I copy the files into /storage for it to work again.

  • There are no dependencies for youtube-dl available. They can be installed, but youtube-dl is no longer able to download files from youtube, so a switch to yt-dlp would make sense. There is already an issue for that, as it doesn't necessarily have to do with docker itself: #104

  • There are no dependencies installed for puppeteer/chrome. Docker needs some special handling, because it does not have the dependencies out of the box.
    How I fixed it in the Dockerfile (taken from https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md)

# Install chromium for puppeteer
RUN apk add --no-cache \
      chromium \
      nss \
      freetype \
      harfbuzz \
      ca-certificates \
      ttf-freefont \
      nodejs \
      yarn
	  
# Set Environment Variable to make sure puppeteer uses this one and does not download a new one
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

It might be necessary to update PuppeteerProvider.php to use more arguments (I added it while trying around, I am not sure if they really are necessary. Definitely doesn't hurt though):

$browser = $puppeteer->launch([
    'args' => ['--no-sandbox', '--disable-gpu'],
    'ignoreHTTPSErrors' => true
]);

For some reason the pdf file did not get stored though. After some testing I figured out that the filename was the problem. I have NO idea why, but I was able to fix it by adding a prefix to the filename in PuppeteerProvider.php:

$filename = sprintf('app/archives/file-%s', $name);
  • The "Test PDF archiving" button in the settings actually uses the URL from the browser, instead of the url the docker is serving from. Since you can change the port number with docker, I am using localhost:38080 externally, but inside the docker, the server runs on port 80, causing a request to http://localhost:38080 to fail and the button not working. I have switched the URL to google.com to get around this. Might be better solutions to that problem.
    Once again in PuppeteerProvider.php
$page->goto('https://www.google.com/');

if you look at my PR, which i have placed over a year ago, you can see a Dockerfile which runs without problems with laravel 9 and my customizations, unfortunately laravel 9 is now almost EOL, the question is whether anything will happen here. In my PR I had made both yt-dl and the pdf archiving work.