harvard-lil/wacz-exhibitor

Internal redirection cycles

rebeccacremona opened this issue · 3 comments

The current default nginx.conf uses try_files in a number of spots.

The nginx docs state the files are tried in order, and then,

If none of the files were found, an internal redirect to the uri specified in the last parameter is made.

As a result if a subpath at /replay-web-page/ isn't found at $uri or $uri/, nginx internally redirects to $uri/, over and over, adding a slash each time, until it errors out, with "rewrite or internal redirection cycle while internally redirecting to <url>//////////".

Example: https://rejouer.perma.cc/replay-web-page/foo

If you do not want to proxy a remote server and delete @remote... from the remaining try_files directives as per the instructions, the same thing happens for any paths ending in .warc, .warc.gz, or wacz.

I think this can be solved by adding =404 as the final parameter, e.g. try_files $uri $uri/ =404;

Testing locally, it looks like you can't add it after a named location, like @remote_wacz_archive =404;: then you get 404s all the time, even if the remote server has the WACZ in question.

Had a look while working on #24, and reached the same conclusion -- @rebeccacremona.

I added =404 to the /replay-web-page/ redirection path in the meantime (better than nothing?).

TBD

TBD:

Maybe we could do something like:

    try_files /archives/$uri /archives/$uri/ @remote_warc_gz_archive;
    # try_files /archives/$uri /archives/$uri/ =404
    # EDIT: Comment out the first line and comment in the second line  if you do not wish to proxy a remote server.

instead of

    try_files /archives/$uri /archives/$uri/ @remote_warc_gz_archive;
    # EDIT: Delete "@remote_warc_gz_archive" from the above list if you do not wish to proxy a remote server.

@rebeccacremona I like this a lot!