morungos/node-word-extractor

JS heap errors observed in real Word files

Closed this issue · 2 comments

We found this with a few files in our big collection. Word opens them okay, but it seems that a bad sector identifier chain can trip up JS fatally. The issue is caused by free sectors (id = -1) breaking the load.

The issue is in the loop:

   while ( secId > AllocationTable.SecIdFree ) {
      secIds.push( secId );
      secId = this._table[secId];
   }

When secId is AllocationTable.SecIdFree , we end up pulling undefined values out and then indexing the table by undefined, which kills Node very fatally. At the least, we should not do this.

I can't easily create a test, because the only files I have contain PII, and if I attempt to redact them, Word fixes the bad sector chain. However, it is real and we might be able to test with a unit test.

I'm using this library to test Word files in upload. I created a corrupted "doc" file that simulate this error.

doc-only-header.doc.zip

This is merged and published as part of 0.3.0