daisy/pipeline

daisy202-validator doesn't recognize mime-type of wav file correctly

egli opened this issue · 2 comments

egli commented

Expected Behavior

I'm using the daisy202-validator from the latest release of the pipeline to validate a DAISY202 export from Obi. Naturally I would expect an untainted export of Obi to be valid DAISY202.

$ /opt/daisy-pipeline2-cli/dp2 version

Client version:                 2.1.6
Pipeline version:               1.14.4
Pipeline authentication:        false

Now, let's run the validator:

$ /opt/daisy-pipeline2-cli/dp2 daisy202-validator --ncc tmp/31537/ncc.html --output 31537-out
Job 27264161-b5e9-4f71-96a6-a7f4064fd678 sent to the server
[INFO]     Validating DAISY 2.02 fileset
[WARNING]  MP3 file duration and ID3 tag validation is not implemented yet
[INFO]     timeToleranceMs set to 500
[INFO]     validating heading hierarchy for ncc.html
[INFO]     Validation completed in 0:22.729
______________________________________________________________________________
████████████████████████████████████████████████████████████████████████ 100.0% 
The job has been deleted from the server
Job finished with status: FAIL

Actual Behavior

The validation report says that

ncc.html

Validated as DAISY 2.02

Path: file:///home/eglic/tmp/31537/ncc.html

241 issues found.
      file type not allowed in DAISY 2.02 fileset: audio/x-wav (expected a html, smil, mp2, mp3, wav, jpg, gif, png or css file type)
      file:/home/eglic/tmp/31537/aud001.wav

For some reason the wav files are seen as audio/x-wav where it expects them to be (probably) audio/wav.

$ file --mime-type ~/tmp/31537/aud214.wav 
/home/eglic/tmp/31537/aud214.wav: audio/x-wav

Details

I found some indication that determining the mime-type of a file using Java can have different results depending on th OS. I could not find any use of this function in all of the Daisy repositories.

Environment

  • Operating system: Ubuntu 22.04.1 LTS
  • DAISY Pipeline 2 version: 1.14.4
  • Interface: Command Line

Logs

html-report.zip

Thanks for the report. This is indeed a bug. The media type detection happens in XProc, no Java involved there.