mobie/mobie-utils-python

remote validation

Closed this issue · 9 comments

Hi,

I have a strange validation issue.

This project:
https://github.com/mobie/environmental-dinoflagellate-vCLEM
fails to validate with problems in the remote sources.

ValueError: Could not find valid data path in XML file  data/photosynthetic_dinoflagellate/images/bdv-n5-s3/Chloroplast.xml.

however,
https://s3.embl.de/environmental-dinoflagellate-vclem/photosynthetic_dinoflagellate/images/bdv-n5/Chloroplast.n5/setup0/attributes.json and all other files exist and are publicly accessible on the S3.

Also, the project opens fine from remote in MoBIE.

Could this be an issue with the underscore in the s3 key?

This looks to me like it only validates the local data (and can't find it).
You probably need to run mobie.validate_project -r 0 -d 1 ... to only check the remote data.

See

$ mobie.validate_project -h
usage: Validate MoBIE project metadata [-h] --input INPUT [--require_local_data REQUIRE_LOCAL_DATA]
                                       [--require_remote_data REQUIRE_REMOTE_DATA]

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT, -i INPUT
                        the project location
  --require_local_data REQUIRE_LOCAL_DATA, -r REQUIRE_LOCAL_DATA
                        check that local data exists
  --require_remote_data REQUIRE_REMOTE_DATA, -d REQUIRE_REMOTE_DATA
                        check that remote data exists

mobie.validate_project -r 0 -d 1 works. r 1 causes it to fail. Also all local data files are present in that directory.
I don´t understand why it looks for something local from the bdv-n5-s3 XMLs. They link to S3 no matter whether on local disk or on GitHub.

I don´t understand why it looks for something local from the bdv-n5-s3 XMLs. They link to S3 no matter whether on local disk or on GitHub.

With -r 1 it will also check the data in bdv-n5 and will fail if the corresponding data is not there. So this is the expected behavior.

That makes total sense.

However, in this particular case both local and remote data exist.

$ cat data/photosynthetic_dinoflagellate/images/bdv-n5/Chloroplast.n5/setup0/attributes.json
{"dataType":"uint8","downsamplingFactors":[[1,1,1],[2,2,2],[4,4,4],[8,8,8],[16,16,16],[32,32,32],[64,64,64]]}

plus it specifically complains about the S3 data when checking locally (that's the reason I wanted pybdv to show me the affected file).

ValueError: Could not find valid data path in XML file  data/photosynthetic_dinoflagellate/images/bdv-n5-s3/Chloroplast.xml.

Does that mean it cannot find the file it is pointing to? Or is it because the XML does not contain a path that pybdv understands (pointing to S3 instead of a local path)?

Ok, I see. Maybe something with the metadata is duplicated. I will check it out later.

I had a look at the metadata in the project and couldn't see any obvious issue.
I also checked the https://github.com/mobie/covid-em-project , which has a similar set-up (local and remote data in bdv.n5 format), but couldn't reproduce the error in there; the validation works as expected.

Is the https://github.com/mobie/environmental-dinoflagellate-vCLEM (with local image data) somewhere on the EMBL share where I could access it?

check /g/schwab/Karel/mobie_cell1

Yep, there was an issue in one of the conditions in the validation, which caused the remote xmls to be validated by the function for local data.

Should work once you pull master.