ffdev-info/signature-development-utility

Sequences not working with DROID

Opened this issue · 4 comments

It looks like there is still a shift between what can be interpreted from a DROID signature file and the DROID container signature file.

There is a related issue here: digital-preservation/droid#237

This XML should work, but it doesn't and it is certainly an unpublished specification from TNA.

<?xml version="1.0" encoding="UTF-8"?>
<FFSignatureFile xmlns="http://www.nationalarchives.gov.uk/pronom/SignatureFile" Version="1" DateCreated="2020-09-19T21:29:09">
 <InternalSignatureCollection>
  <InternalSignature ID="2" Specificity="Specific">
   <ByteSequence Reference="BOFoffset">
    <SubSequence Position="1" MinFragLength="0" SubSeqMinOffset="0" SubSeqMaxOffset="4">
     <Sequence>504B0304</Sequence>
    </SubSequence>
   </ByteSequence>
   <ByteSequence Reference="EOFoffset">
    <SubSequence Position="1" MinFragLength="0" SubSeqMinOffset="0" SubSeqMaxOffset="4">
     <Sequence>504B01{43-65531}504B0506{18-65531}</Sequence>
    </SubSequence>
   </ByteSequence>
  </InternalSignature>
  <InternalSignature ID="3" Specificity="Specific">
   <ByteSequence Reference="BOFoffset">
    <SubSequence Position="1" MinFragLength="0" SubSeqMinOffset="0" SubSeqMaxOffset="0">
     <Sequence>504B0304{26}5B436F6E74656E745F54797065735D2E786D6C20A2*504B0102*504B0506</Sequence>
    </SubSequence>
   </ByteSequence>
  </InternalSignature>
  <InternalSignature ID="4" Specificity="Specific">
   <ByteSequence Reference="BOFoffset">
    <SubSequence Position="1" MinFragLength="0" SubSeqMinOffset="0" SubSeqMaxOffset="0">
     <Sequence>D0CF11E0A1B11AE1{20}FEFF</Sequence>
    </SubSequence>
   </ByteSequence>
  </InternalSignature>
 </InternalSignatureCollection>
 <FileFormatCollection>
  <FileFormat ID="1" Name="Development Signature" PUID="dev/1" Version="1.0" MIMEType="application/octet-stream">
   <Extension>ext</Extension>
  </FileFormat>
  <FileFormat ID="2" Name="ZIP Format" PUID="x-fmt/263" Version="" MIMEType="application/zip">
   <InternalSignatureID>2</InternalSignatureID>
   <Extension>zip</Extension>
  </FileFormat>
  <FileFormat ID="3" Name="Microsoft Office Open XML" PUID=" fmt/189" Version="" MIMEType="application/octet-stream">
   <InternalSignatureID>3</InternalSignatureID>
  </FileFormat>
  <FileFormat ID="4" Name="OLE2 Compound Document Format" PUID=" fmt/111" Version="" MIMEType="application/octet-stream">
   <InternalSignatureID>4</InternalSignatureID>
  </FileFormat>
 </FileFormatCollection>
</FFSignatureFile>

To build the current development signatures in Siegfried we can do the following: ./roy build -droid development-signature-dev-1.xml -noreports -container container-signature-20200918.xml these will work there.

Notes on patterns

  • Doesn't work: 504B030414{2}000800F0AB3051000000000000000002
  • Doesn't work: 504B030414{2-4}000800F0AB3051000000000000000002
  • Doesn't work: 504B030414*000800F0AB3051000000000000000002
  • Doesn't work: 504B030414??08000800F0AB3051000000000000000002
           
  • works: 504B030414[00:01]08000800F0AB3051000000000000000002      
  • works: 'kml xmlns='
  • works: 504B030414(00|01)08000800F0AB3051000000000000000002

This is fixed for now by wiring in the PHP version of the code here which is pretty graceful, but adds complexity we don't need. But it works.

I'll probably need to update this blog.

NB. To be clear, this was always a mistaken view on my part. It may however be good if the signature file syntax accepted by DROID was simplified.

NB. It looks like my syntax may be wrong so the following needs to be tried and tested:

  <InternalSignature ID="3" Specificity="Specific">
   <ByteSequence Reference="BOFoffset" Sequence="04??[01:0C][01:1F]{28}([41:5A]|[61:7A]){10}(43|44|46|4C|4E)"/>
  </InternalSignature>