nfdi4plants/ARCCommander

[BUG] datahub validation fails for v1.0-pre2

Opened this issue · 14 comments

Describe the bug

The DataHUB validation pipeline fails for arcs created with https://github.com/nfdi4plants/ARCCommander/releases/download/v1.0.0-preview.2/arc_osx-x64

This is before running "metadata tests".

Screenshot 2023-11-20 at 10 31 30

Screenshot 2023-11-20 at 10 29 42

To Reproduce

arc_osx-x64 init
arc_osx-x64 assay add -s v1pre2-Study -a v1pre2-Assay
arc_osx-x64 i person register --lastname LastName --firstname FirstName --email email@nfdi4plants.org --affiliation DataPLANT
arc_osx-x64 investigation update -i v1pre2 --description "Description v1pre2" --title "Title v1pre2"
arc_osx-x64 sync -f -r https://git.nfdi4plants.org/<userName>/v1pre2 -m "v1pre2 test"

Was this different when pushing with other tools/tool-versions?

The error message does not really say anything about output the ARCCommander would produce.

Maybe @omaus or @j-bauer have an idea about this?

This does not happen with ARC commander v0.0.5 using the same commands above.

Maybe @omaus or @j-bauer have an idea about this?

I'd need the whole console output to investigate this. It's most certainly due to deprecated validation pipeline (i.e., arc-validate project).

This does not happen with ARC commander v0.0.5 using the same commands above.

Ah okay I see. Then it is probably a mismatch between ARCCommander and validation pipeline version, as @omaus suggested.

The tests failed in a way, that no xml file was created. Maybe we could still get some mechanic for retreiving the reason for this in future cases, e.g. wrapping the complete pipeline call into a try .. with?

Here is the relevant output of the arc-validate command:

$ bash /opt/arc-validate/arc-validate.sh; ret=$?
+ arc-validate
Internal Error:                         
Cannot modify readonly container        
"   at System.IO.Packaging.Package.ThrowIfReadOnly()
   at System.IO.Packaging.Package.CreatePart(Uri partUri, String contentType, CompressionOption compressionOption)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.CreateMetroPart(Uri partUri, String contentType)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPart.CreateInternal(OpenXmlPackage openXmlPackage, OpenXmlPart parent, String contentType, String targetExt)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.InitPart[T](T newPart, String contentType, String id)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.InitPart[T](T newPart, String contentType)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.AddNewPartInternal[T]()
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.AddNewPart[T]()
   at FsSpreadsheet.ExcelIO.Spreadsheet.getOrInitSharedStringTablePart(SpreadsheetDocument spreadsheetDocument)
   at FsSpreadsheet.ExcelIO.Spreadsheet.getCellsBySheet(Sheet sheet, SpreadsheetDocument spreadsheetDocument)
   at FsSpreadsheet.ExcelIO.Spreadsheet.getCellsBySheetID(String sheetID, SpreadsheetDocument spreadsheetDocument)
   at FsSpreadsheet.ExcelIO.FsExtensions.sheets@182.Invoke(Sheet xlsxSheet)
   at Microsoft.FSharp.Collections.Internal.IEnumerator.map@99.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 102
   at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 84
   at Microsoft.FSharp.Collections.SeqModule.Fold[T,TState](FSharpFunc`2 folder, TState state, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 872
   at FsSpreadsheet.ExcelIO.FsExtensions.FsWorkbook.fromXlsxFile.Static(String filePath)
   at ArcValidation.Configs.ArcConfig.get_InvestigationStudies() in /opt/arc-validate/src/ArcValidation/Configs/ArcConfig.fs:line 27
   at ArcValidation.Configs.ArcConfig.get_StudyPathsAndIds() in /opt/arc-validate/src/ArcValidation/Configs/ArcConfig.fs:line 33
   at ArcValidation.TestGeneration.Critical.Arc.FileSystem.generateArcFileSystemTests(ArcConfig arcConfig) in /opt/arc-validate/src/ArcValidation/TestGeneration/Critical/ArcFileSystem.fs:line 18
   at ARCValidate.main(String[] argv) in /opt/arc-validate/src/arc-validate/Program.fs:line [29](https://git.nfdi4plants.org/<redacted>/v1pre2/-/jobs/2454#L29)"

Resulting in another error later on, since arc-validate did not create the arc-validate-results.xml:

$ /opt/arc-validate/create-badge.py
Traceback (most recent call last):
  File "/opt/arc-validate/create-badge.py", line 9, in <module>
    xml = JUnitXml.fromfile(xml_path)
  File "/usr/local/lib/python3.9/dist-packages/junitparser/junitparser.py", line 751, in fromfile
    tree = etree.parse(filepath)  # nosec
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1229, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line [56](https://git.nfdi4plants.org/<redacted>/v1pre2/-/jobs/2454#L56)9, in parse
    source = open(source, "rb")
FileNotFoundError: [Errno 2] No such file or directory: './arc-validate-results.xml'

I thought the arc-validate tool is supposed to always create that XML file in all cases, isn't it?

Here is the relevant output of the arc-validate command:

$ bash /opt/arc-validate/arc-validate.sh; ret=$?
+ arc-validate
Internal Error:                         
Cannot modify readonly container        
"   at System.IO.Packaging.Package.ThrowIfReadOnly()
   at System.IO.Packaging.Package.CreatePart(Uri partUri, String contentType, CompressionOption compressionOption)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.CreateMetroPart(Uri partUri, String contentType)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPart.CreateInternal(OpenXmlPackage openXmlPackage, OpenXmlPart parent, String contentType, String targetExt)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.InitPart[T](T newPart, String contentType, String id)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.InitPart[T](T newPart, String contentType)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.AddNewPartInternal[T]()
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.AddNewPart[T]()
   at FsSpreadsheet.ExcelIO.Spreadsheet.getOrInitSharedStringTablePart(SpreadsheetDocument spreadsheetDocument)
   at FsSpreadsheet.ExcelIO.Spreadsheet.getCellsBySheet(Sheet sheet, SpreadsheetDocument spreadsheetDocument)
   at FsSpreadsheet.ExcelIO.Spreadsheet.getCellsBySheetID(String sheetID, SpreadsheetDocument spreadsheetDocument)
   at FsSpreadsheet.ExcelIO.FsExtensions.sheets@182.Invoke(Sheet xlsxSheet)
   at Microsoft.FSharp.Collections.Internal.IEnumerator.map@99.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 102
   at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 84
   at Microsoft.FSharp.Collections.SeqModule.Fold[T,TState](FSharpFunc`2 folder, TState state, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 872
   at FsSpreadsheet.ExcelIO.FsExtensions.FsWorkbook.fromXlsxFile.Static(String filePath)
   at ArcValidation.Configs.ArcConfig.get_InvestigationStudies() in /opt/arc-validate/src/ArcValidation/Configs/ArcConfig.fs:line 27
   at ArcValidation.Configs.ArcConfig.get_StudyPathsAndIds() in /opt/arc-validate/src/ArcValidation/Configs/ArcConfig.fs:line 33
   at ArcValidation.TestGeneration.Critical.Arc.FileSystem.generateArcFileSystemTests(ArcConfig arcConfig) in /opt/arc-validate/src/ArcValidation/TestGeneration/Critical/ArcFileSystem.fs:line 18
   at ARCValidate.main(String[] argv) in /opt/arc-validate/src/arc-validate/Program.fs:line [29](https://git.nfdi4plants.org/<redacted>/v1pre2/-/jobs/2454#L29)"

Looks 2 me like the Investigation file is read-only. Could you chick this @Brilator. Might also be any of the Study files...

I thought the arc-validate tool is supposed to always create that XML file in all cases, isn't it?

'xactly.
If read-only is the cause of this, I'll keep it in mind for arc-validate V2.

@omaus can you do me a favor and check this with latest arc commander on windows using the commands above?

If it is read-only, that’s still an arc commander bug.

@omaus can you do me a favor and check this with latest arc commander on windows using the commands above?

If it is read-only, that’s still an arc commander bug.

Not read-only @ Windows.

Then that's not the reason. Or did validation work?

I created a test repo in Gitlab with the commands from above. None of the XLSX files had read-only, yet the pipeline did not work and prints the same error as above.
@HLWeil Any ideas? Might be sth. with a newer FsSpreadsheet version and some alterations in reading XLSX files.

Yeah might be.. Does this error occur for all ARCs, @j-bauer @Brilator?

If so it should hopefully be easy to reproduce.

Still relevant for ARC Commander v1

Run the following to create a minimal ARC that should be valid for invenio.

mkdir arc-v1-test; cd arc-v1-test

arc init
arc assay add -s v1-test-Study -a v1-test-Assay

arc i person register --lastname TestLastName --firstname TestFirstName --email testmail@nfdi4plants.org --affiliation DataPLANT
arc i update -i v1-test --description "Description v1-test" --title "Title v1-test"

arc export
arc a list
arc s list

arc sync -f -r https://git.nfdi4plants.org/<>/v1-test -m "v1-test"

Fails during validate ARC with

Running with gitlab-runner 16.2.1 (674e0e29)
  on dataplant-runner-0 iAYwqpK5, system ID: r_RntxNI6dNOlh

Preparing the "docker" executor
00:02
Using Docker executor with image ghcr.io/nfdi4plants/arc-validate:main ...
Pulling docker image ghcr.io/nfdi4plants/arc-validate:main ...
Using docker image sha256:31c612d8a4cbd25d26e1ca5263e9699ecb41495a7b9014d96da9c176136b2f0f for ghcr.io/nfdi4plants/arc-validate:main with digest ghcr.io/nfdi4plants/arc-validate@sha256:56352f8074174962e89e6b6367e74901e705092bbe9322057c5772d6d5fca1bf ...

Preparing environment
00:00
Running on runner-iaywqpk5-project-1044-concurrent-0 via 8764d0667e17...

Getting source from Git repository
00:01
Fetching changes with git depth set to 20...
Reinitialized existing Git repository in /builds/brilator/v1-test/.git/
Checking out b21e357a as detached HEAD (ref is main)...
Removing arc-summary.md
Removing arc.json
Skipping Git submodules setup

Downloading artifacts
00:02
Downloading artifacts for create ARC JSON (3489)...
Downloading artifacts from coordinator... ok        host=s3.bwsfs.uni-freiburg.de id=3489 responseStatus=200 OK token=64_tMrK_

Executing "step_script" stage of the job script
00:00
Using docker image sha256:31c612d8a4cbd25d26e1ca5263e9699ecb41495a7b9014d96da9c176136b2f0f for ghcr.io/nfdi4plants/arc-validate:main with digest ghcr.io/nfdi4plants/arc-validate@sha256:56352f8074174962e89e6b6367e74901e705092bbe9322057c5772d6d5fca1bf ...
$ echo "Running unit tests... "
Running unit tests... 
$ set +e
$ bash /opt/arc-validate/arc-validate.sh; ret=$?
+ arc-validate
arc-validate failed due to an internal error.
This error did likely NOT occur due to user input.
An empty test result file will be created to reflect this and prevent the validation pipeline from failing.
Run arc-validate with --verbose to see the full error message.
[11:30:14 ERR] arc-validate.arc-validate failed in 00:00:00.0050000. 
arc-validate failed due to an internal error
This error did likely NOT occur due to user input.
An empty test result file will be created to reflect this and prevent the subsequent validation pipeline from failing.
. Actual value was true but had expected it to be false.
   at ARCValidate.createInternalFailDummyTestResults@13.Invoke(Unit _arg1) in /opt/arc-validate/src/arc-validate/Program.fs:line 14
   at Expecto.Impl.execTestAsync@569-1.Invoke(Unit unitVar)
   at Microsoft.FSharp.Control.AsyncPrimitives.CallThenInvoke[T,TResult](AsyncActivation`1 ctxt, TResult result1, FSharpFunc`2 part2) in D:\a\_work\1\s\src\FSharp.Core\async.fs:line 508
   at Microsoft.FSharp.Control.Trampoline.Execute(FSharpFunc`2 firstAction) in D:\a\_work\1\s\src\FSharp.Core\async.fs:line 112 <Expecto>
$ echo "$ret"
3
$ set -e
$ /opt/arc-validate/create-badge.py
$ exit "$ret"

Uploading artifacts for failed job
00:09
Uploading artifacts...
arc-validate-results.xml: found 1 matching artifact files and directories 
arc-quality.svg: found 1 matching artifact files and directories 
Uploading artifacts as "archive" to coordinator... 201 Created  id=3490 responseStatus=201 Created token=64_tMrK_
Uploading artifacts...
arc-validate-results.xml: found 1 matching artifact files and directories 
Uploading artifacts as "junit" to coordinator... 201 Created  id=3490 responseStatus=201 Created token=64_tMrK_

Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit code 3

Hmm not sure whether the validation pipeline is already rolled out for ARC v1.x.x.

@kMutagene @omaus

The new package based validation pipelines are not rolled out yet. I think it is the easiest to just ignore these errors until we can move forward next week