gbif/pipelines

Validation is broken

Closed this issue · 3 comments

Reported by email. Files which validated are no longer able to.

This file DarwinCoreArchive.zip produces the following logs in the service.

INFO  [2023-03-17 07:35:39,595+0000] [http-nio-8118-exec-7] org.gbif.validator.service.ValidationServiceImpl: Staring validation for the file file
INFO  [2023-03-17 07:35:39,602+0000] [http-nio-8118-exec-7] org.gbif.validator.service.ValidationServiceImpl: Create validation record for key 4f36afe2-4aa3-4928-a157-ca7910029d7e
INFO  [2023-03-17 07:35:39,621+0000] [ForkJoinPool.commonPool-worker-10] org.gbif.validator.service.ValidationServiceImpl: File has been uploaded and decompressed, key 4f36afe2-4aa3-4928-a157-ca7910029d7e
ERROR [2023-03-17 07:35:39,622+0000] [ForkJoinPool.commonPool-worker-10] org.gbif.validator.service.ValidationServiceImpl: Can't find metadata eml file
INFO  [2023-03-17 07:35:39,623+0000] [ForkJoinPool.commonPool-worker-10] org.gbif.validator.service.ValidationServiceImpl: Send the MQ message to the validator queue for key - 4f36afe2-4aa3-4928-a157-ca7910029d7e
INFO  [2023-03-17 07:35:39,857+0000] [http-nio-8118-exec-5] org.gbif.validator.service.ValidationServiceImpl: Updating validation key 4f36afe2-4aa3-4928-a157-ca7910029d7e, status FAILED
INFO  [2023-03-17 07:35:39,858+0000] [http-nio-8118-exec-5] org.gbif.validator.service.ValidationServiceImpl: Validation 4f36afe2-4aa3-4928-a157-ca7910029d7e finished with status FAILED

However, there is an eml.xml file in the zip and the archive runs through UAT and indexes without issue here

muttcg commented

There is no tag for the meta file name, usually it is like
<archive xmlns="http://rs.tdwg.org/dwc/text/" metadata="eml.xml">
In this case is:
<archive xmlns="http://rs.tdwg.org/dwc/text/">

By default validator doesn't use any file name, this because of the recent change to fix gbif/portal-feedback#4587

UAT uses old version where name was fixed eml.xml

Ah, thanks.

usually it is like <archive xmlns="http://rs.tdwg.org/dwc/text/" metadata="eml.xml">

The spec only says it SHOULD be present and I know of many DwC-A that won't have this. We need to continue to accommodate zips containing eml.xml and EML.xml files without that attribute.

muttcg commented

Fixed