pkiraly/metadata-qa-api

Check content type

Closed this issue · 0 comments

The Deutsche Digitale Bibliothek would like to ckeck the content type of thumbnail URLs.

The Rule class will have a new contentType property which should be a list of MINE content types, such as

  - name: thumbnail
    path: oai:record/dc:identifier[@type='binary']
    rules:
      - or:
        - pattern: ^.*\.(jpg|jpeg|jpe|jfif|png|tiff|tif|gif|svg|svgz|pdf)$
        - contentType: [image/jpeg, image/png, image/tiff, image/tiff-fx, image/gif, image/svg+xml]
        id: 3.1

A new class ContentTypeChecker will take care of the validation. The validation might have 3 steps:

  1. it should check if the link can be interpreted as a URL
  2. it issues a HEAD request. It should return a valid HTTP success code (e.g. 200)
  3. it checks the content type header if the value fits to the provided list. The content type might contain extra element, such as ;charset=UTF-8 in Content-Type: image/jpeg;charset=UTF-8. This extra element should be remove before comparision.