G-Node/gogs

DataLad content not available in latest Web-UI version

mpsonntag opened this issue · 9 comments

The issue was submitted by email:

As indicated in the wiki, we started using the latest source code (locally build Docker image), but noticed that the annexed contents of our DataLad datasets were not available in the Web UI.
After some research, we realized that the working version might be the one tagged "gin-live-2020-10-24" (according to
#125 (comment)), and indeed, DataLad datasets content are fully accessible in this version's Web UI.

We are using our in-house gin server running a docker image built with the latest version of the master branch

Here are the steps to follow in order to create a minimal repo showing the problem :

gin-cli-latest-linux/gin use-server my-gin-server
gin-cli-latest-linux/gin login
Logging into my-gin-server
Login: gin-owner
Password: ***********


cd /tmp

datalad create -c text2git test-repo

cd test-repo

gin-cli-latest-linux/gin  create --no-clone  test-repo

datalad siblings add --name my-gin \
 -d . \
 --url ssh://git@my-gin-server.example.com:2121/gin-owner/test-repo
 
[INFO   ] Could not enable annex remote my-gin. This is expected if my-gin is a pure Git remote, or happens if it is not accessible. 
[WARNING] Could not detect whether my-gin carries an annex. If my-gin is a pure Git remote, this is expected.  
 
datalad download-url -d . -m 'add >10M image' https://photojournal.jpl.nasa.gov/tiff/PIA25015.tif

datalad push --to my-gin

When trying to download the image using the UI (https://my-gin-server.example.com/gin-owner/test-repo/raw/master/PIA25015.tif), a 500 error screen is displayed, with the text "An error has occurred : the entry is not a blob Application Version: 0.12.3"

And the server log contains the following:

[ERROR] [...gogs/internal/context/context.go:202 NotFoundOrError()] get blob: the entry is not a blob
[TRACE] Template: status/500

However, the annexed content was pushed on the server, and can be retrieved again with datalad:

datalad install https://my-gin-server.example.com/gin-owner/test-repo
cd test-repo
datalad get .

When the same repo is pushed on a server running the gin-live-2020-10-24 version, the content is accessible in the WebUI.

It might also have to do with the git annex version that you are using. gin and the gin-client both use git-annex version 8 which is incompatible with the latest annex version 10 to a certain degree that is now installed by default. It might be that this is the root of these issues.

Thanks for the hint about git-annex compatibility.
However, we face same issue when using the official docker image (gnode/gin-web:latest), which is shipped with git-annex version 8.

In order to build the docker image with git-annex version 8.20200501 (i.e. the version used in gnode/gin-web:latest) instead of its latest build, it might be necessary to edit the Dockerfile.

E.g. replace:

RUN curl -Lo /git-annex/git-annex-standalone-amd64.tar.gz https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-amd64.tar.gz

by:

RUN curl -Lo /git-annex/git-annex-standalone-amd64.tar.gz https://archive.org/download/git-annex-builds/SHA256E-s52969824--7bbaf6940d7790fa1bb261436eb2e60611413e103344d54f5cb751ab06d3c186.tar.gz

You could also try to build a new docker container with the current Dockerfile (if you have not done so already). This should in theory create a docker container with annex v10, since it should fetch the latest annex and you could test if such a deployment then works with datalad and the latest git annex version.

Indeed this what I did first (unbeknownst to me, as I was not aware the Dockerfile would retrieve the latest version of git-annex available when the docker image is built)

Sorry my explanations were not clear; To sum up the situation, whatever the version of git-annex used in the container (v10 or v8), I face the same problem :

  • with container based on latest sources, => the problem occurs (i.e. DataLad annexed content can not be downloaded with the UI).
  • with container based on gin-live-2020-10-24 sources => no problem.

@mpsonntag so should we use gin-live-2020-10-24, or is there another way around, can the "latest" be fixed another way ?

(just saw we have the same problem...)

(is there anything we would miss by using that version?)

PS: why is then the files uploaded via GIN-cli still available ?

Unfortunately currently there is no quick way to fix the "latest" build to fully support datalad. If you want to support datalad for the time being its probably best to use the gin-live-2020-10-24 docker container until we can fix this issue in the "latest" branch.

some extra tests: data added with gin-cli is also unavailable.
(in previous test, I used files that were not that big and were added to git with gin-cli, while they were added to annex with datalad)

So issues is not only about datalad, it seems any annexed content is not available.