mattiasw/ExifReader

`zTXt` tag values are decoded incorrectly

Closed this issue · 7 comments

Description

The specification for png zTXt value encoding is latin1, as seen https://www.w3.org/TR/png/#11zTXt. Currently it appears that ExifReader decodes values as utf-8, which causes some loss of data...

I suspect that the library also uses the same decoding for tEXt values, but have not confirmed, the w3 link shows the specification for tEXt also uses latin1, while iTXt uses utf-8.

Additional details

  • ExifReader version: 4.23.1
  • Web browser and version: NA
  • Node version: 20.11.1
  • Can you reproduce the bug on the examples site? If so, using which implementation (global object, AMD, and/or ES module)? https://mattiasw.github.io/ExifReader/

I checked the chunks with https://www.nayuki.io/page/png-file-chunk-inspector, where I can see the latin1 characters encoded correctly, as well as with exiftool...

The global object on the linked site also shows the wrong character.
With ExifTool you can see that the character is an À.

Screenshot 2024-05-23 at 5 48 40 PM

How to reproduce

I've included the png below, I hope the field was not removed :(

naan-compressed-latin

  1. Load the image
  2. View the tags.

What I expected would happen:

Tags values are decoded

What really happened:

Tag values are decoded (incorrectly 😢 )

Hi! Thanks for the report. After a quick look it seems it might be tricky to solve. The implementation uses the Web API Response.text() and MDN says this about it: "The response is always decoded using UTF-8."

https://developer.mozilla.org/en-US/docs/Web/API/Response/text

I will take a closer look though and see if anything can be done.

Maybe something along the lines of...

const tagType = "zTXt"

function labelFor(tagType: string){
  if (tagType === "zTXt") return "latin1";
  if (tagType === "tEXt") return "latin1";
  return 'utf-8'
} 

text = TextDecoder(labelFor(tagType)).decode(Response.arrayBuffer());

I just realized that this project is also designed to work in the browser and not just nodeJS...

If you have other ideas on how to approach this, I can certainly help out....

Thanks, looks like it should work, will try it out!

It worked great! I found some other non-related issues though while testing this that I have to fix first.

Are the "non-related" issues related to if/how compressed values are resolved? Or is this an a "how I'm using it" problem?

describe("ExifReader", () => {
  const file = "/Users/amar/dev/fastai/naan-compressed-latin.png"

  it("loads png from file with async set", async () => {
    const data = await ExifReader.load(file, {async: true});
    console.log(data);
    expect(data["wassup"].value).toContain("hello");
  });

  it("loads png from file, sync", async () => {

    const data = ExifReader.load(file);
    console.log(data);
    expect(data["wassup"].value).toContain("hello");
  });

  it("loads png from buffer", async () => {
    const data = readFileSync(file)
    const exifData = await ExifReader.load(data, {async: true});
    console.log(exifData);
    expect(exifData["wassup"].value).toContain("hello");
  })
});

outputs

  console.log
    {
      Orientation: { id: 274, value: 1, description: 'top-left' },
      'Exif IFD Pointer': { id: 34665, value: 38, description: 38 },
      ColorSpace: { id: 40961, value: 1, description: 'sRGB' },
      PixelXDimension: { id: 40962, value: 64, description: 64 },
      PixelYDimension: { id: 40963, value: 64, description: 64 },
      'Image Width': { value: 64, description: '64px' },
      'Image Height': { value: 64, description: '64px' },
      'Bit Depth': { value: 8, description: '8' },
      'Color Type': { value: 2, description: 'RGB' },
      Compression: { value: 0, description: 'Deflate/Inflate' },
      Filter: { value: 0, description: 'Adaptive' },
      Interlace: { value: 0, description: 'Noninterlaced' },
      wassup: { value: Promise { <pending> }, description: Promise { <pending> } },
      FileType: { value: 'png', description: 'PNG' }
    }

      at Object.log (image-manipulation/__test__/inject-into-exif.test.ts:66:13)


Error: expect(received).toContain(expected) // indexOf

Expected value:  "hello"
Received object: {}

    at Object.toContain (/Users/amar/dev/tcloud-alpha/src/image-manipulation/__test__/inject-into-exif.test.ts:67:34)
  console.log
    Promise { <pending> }

      at Object.log (image-manipulation/__test__/inject-into-exif.test.ts:73:13)

  console.log
    {
      Orientation: { id: 274, value: 1, description: 'top-left' },
      'Exif IFD Pointer': { id: 34665, value: 38, description: 38 },
      ColorSpace: { id: 40961, value: 1, description: 'sRGB' },
      PixelXDimension: { id: 40962, value: 64, description: 64 },
      PixelYDimension: { id: 40963, value: 64, description: 64 },
      'Image Width': { value: 64, description: '64px' },
      'Image Height': { value: 64, description: '64px' },
      'Bit Depth': { value: 8, description: '8' },
      'Color Type': { value: 2, description: 'RGB' },
      Compression: { value: 0, description: 'Deflate/Inflate' },
      Filter: { value: 0, description: 'Adaptive' },
      Interlace: { value: 0, description: 'Noninterlaced' },
      wassup: { value: Promise { <pending> }, description: Promise { <pending> } },
      FileType: { value: 'png', description: 'PNG' }

It is related to the async vs. sync situation. :-) But it's regarding the test scripts I have for checking that a new version didn't change the output of any older image files unless on purpose. The scripts did not handle the async part correctly and missed all asynchronous tags. 🙈

I don't get the same output as you though, with the pending promises in value and description. 🤔

Fixed and released as version 4.23.2. Thanks again for reporting!