如何在浏览器中处理二进制数据？

Question

如何在浏览器中处理二进制数据？

akira-cn opened this issue 5 years ago · 1 comments

一般来说，在Web开发中，很少处理二进制数据，所以很多同学对于JavaScript二进制的API比较陌生。

实际上，浏览器提供了一系列API，用来处理非文本的二进制数据，比如图片、音频、视频等等，现在Web的JavaScript有能力处理它们。

我们来看一个例子：

async function pngInfo(url) {  
  const res = await fetch(imageURL);
  const arrayBuffer = await res.arrayBuffer();
  
  function isPNG(arrayBuffer) {
    const pngSign = new Uint8Array([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]);
    const signData = new Uint8Array(arrayBuffer.slice(0, 8));
    return pngSign.length === signData.length
      && pngSign.every((data, i) => data === signData[i]);
  }
  
  function chunks(arrayBuffer) {
    let offset = 8;
    const size = arrayBuffer.byteLength;
    const info = {chunks: [], buffer: arrayBuffer};
    do {
      const view = new DataView(arrayBuffer, offset, 8);
      const [chunkSize, type] = [view.getUint32(0), view.getUint32(4)];
      const chunkType = CHUNK_TYPES.get(type);
      info.chunks.push([chunkType, chunkSize]);
      if(chunkType === 'IHDR') {
        const ihdrView = new DataView(arrayBuffer, offset + 8, chunkSize);
        const [width, height] = [ihdrView.getInt32(0), ihdrView.getInt32(4)];
        const depth = ihdrView.getUint8(8),
              colorType = ihdrView.getUint8(9),
              compressionMethod = ihdrView.getUint8(10),
              filterMethod = ihdrView.getUint8(11),
              interlaceMethod = ihdrView.getUint8(12);
        info.meta = {width, height, depth, colorType,
                    compressionMethod, filterMethod, interlaceMethod};
      }
      offset += (8 + chunkSize + 4);
    } while(offset < size);
    return info;
  }
  
  const CHUNK_TYPES = new Map([
    [0x49484452, 'IHDR'],
    [0x504c5445, 'PLTE'],
    [0x74524e53, 'TRNS'],
    [0x73524742, 'sRGB'],
    [0x67414d41, 'GAMA'],
    [0x49444154, 'IDAT'],
    [0x49454e44, 'IEND'],
  ]);
  
  if(isPNG(arrayBuffer)) {
    const view = new DataView(arrayBuffer);
    const data = new Uint8Array(view.buffer);
    const info = chunks(arrayBuffer);
    return info;
  }
  return null;
};

const imageURL = 'https://p1.ssl.qhimg.com/t011f60e5399df3d7a6.png';

pngInfo(imageURL).then((info) => {
  console.log(info);
});

上面这段代码读取一个PNG文件并获取图片的meta信息。

Fetch API 请求返回的对象中，可以通过 arrayBuffer() 方法拿到二进制的 ArrayBuffer 数据。

const res = await fetch(imageURL);
const arrayBuffer = await res.arrayBuffer();

👉🏻 【知识点】：ArrayBuffer 对象用来表示通用的、固定长度的原始二进制数据缓冲区。ArrayBuffer 不能直接操作，而是要通过 TypedArray 或 DataView 对象来操作，它们会将缓冲区中的数据表示为特定的格式，并通过这些格式来读写缓冲区的内容。

PNG图片的格式是分块的数据，但是它前8个字节是签名，固定为：[0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]

所以我们可以通过它来检验这个数据格式是否是合法的PNG图片：

function isPNG(arrayBuffer) {
  const pngSign = new Uint8Array([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]);
  const signData = new Uint8Array(arrayBuffer.slice(0, 8));
  return pngSign.length === signData.length
    && pngSign.every((data, i) => data === signData[i]);
}

接下来，数据分为若干Chunk，有不同的类型，但是每个Chunk的前8个字节是块内容的字节数和Chunk类型，末4个字节是校验码。

PNG的Chunk有很多种类型，这里不一一介绍，其中我们关心的是第一个Chunk，它的类型必须是IHDR，长度为13个字节，存放图片的meta信息，包括以下信息：

信息	长度	起始字节
width	4	0
height	4	4
Bit depth	1	8
Colour type	1	9
Compression method	1	10
Filter method	1	11
Interlace method	1	12

我们就可以将这些信息读取出来：

function chunks(arrayBuffer) {
  let offset = 8;
  const size = arrayBuffer.byteLength;
  const info = {chunks: [], buffer: arrayBuffer};
  do {
    const view = new DataView(arrayBuffer, offset, 8);
    const [chunkSize, type] = [view.getUint32(0), view.getUint32(4)];
    const chunkType = CHUNK_TYPES.get(type);
    info.chunks.push([chunkType, chunkSize]);
    if(chunkType === 'IHDR') {
      const ihdrView = new DataView(arrayBuffer, offset + 8, chunkSize);
      const [width, height] = [ihdrView.getInt32(0), ihdrView.getInt32(4)];
      const depth = ihdrView.getUint8(8),
            colorType = ihdrView.getUint8(9),
            compressionMethod = ihdrView.getUint8(10),
            filterMethod = ihdrView.getUint8(11),
            interlaceMethod = ihdrView.getUint8(12);
      info.meta = {width, height, depth, colorType,
                  compressionMethod, filterMethod, interlaceMethod};
    }
    offset += (8 + chunkSize + 4);
  } while(offset < size);
  return info;
}

由于信息的长度是不一致的，有的是4个字节，有的是1个字节，在这里，我们通过DataView来操作：

// 以起始位置offset创建DataView
const view = new DataView(arrayBuffer, offset, 8); 
// 起始8个字节中，前4个字节是内容长度
const [chunkSize, type] = [view.getUint32(0), view.getUint32(4)];

// 我们检查chunkType
const chunkType = CHUNK_TYPES.get(type);
info.chunks.push([chunkType, chunkSize]);
// 如果是meta信息的话，chunkType是IHDR
if(chunkType === 'IHDR') {
  // 创建新的DataView，起始位置为offset往后8个字节，长度为chunkSize
  const ihdrView = new DataView(arrayBuffer, offset + 8, chunkSize);
  // 头8个字节是width和height，每个都是4字节，所以用getInt32
  const [width, height] = [ihdrView.getInt32(0), ihdrView.getInt32(4)];
  // 后续的信息都是1个字节，所以用getUnit8
  const depth = ihdrView.getUint8(8),
        colorType = ihdrView.getUint8(9),
        compressionMethod = ihdrView.getUint8(10),
        filterMethod = ihdrView.getUint8(11),
        interlaceMethod = ihdrView.getUint8(12);
  info.meta = {width, height, depth, colorType,
              compressionMethod, filterMethod, interlaceMethod};
}

这样我们就获得了图片meta数据，数据如下：

{
  "chunks":[["IHDR",13],["sRGB",1],["IDAT",14000],["IEND",0]],
  "meta":{
    "width":400,
    "height":400,
    "depth":8,
    "colorType":6,
    "compressionMethod":0,
    "filterMethod":0,
    "interlaceMethod":0
  }
}

扩展

PNG图片的压缩数据在"IDAT"类型的Chunk中，如果是在服务器上，我们可以通过zlib.inflate把数据解压出来。但是我们在浏览器上，是没办法直接解压数据的。不过我们可以将图片绘制到canvas上，然后通过canvas的getImageData API来读取数据，改写完成之后，将再通过putImageData将它写回canvas中，最后通过canvas.toDataURL()将图片数据转成base64，然后写到Image对象的src中去，这样就将处理好的图片显示出来了。

const imageURL = 'https://p1.ssl.qhimg.com/t011f60e5399df3d7a6.png';
pngInfo(imageURL).then((info) => {
  return createImageBitmap(new Blob([info.buffer]));
}).then((bitmap) => {
  const canvas = document.createElement('canvas');
  canvas.width = bitmap.width;
  canvas.height = bitmap.height;
  const context = canvas.getContext('2d');
  context.drawImage(bitmap, 0, 0);
  const imageData = context.getImageData(0, 0, canvas.width, canvas.height);
  const data = imageData.data;
  for(let i = 0; i < data.length; i += 4) {
    const r = data[i];
    const g = data[i + 1];
    const b = data[i + 2];
    const grey = Math.floor((r + g + b) / 3);
    data[i] = data[i + 1] = data[i + 2] = grey;
  }
  context.putImageData(imageData, 0, 0);
  const image = new Image();
  image.src = canvas.toDataURL();
  document.body.appendChild(image);
});

在这里，我通过createImageBit用buffer数据创建ImageBitmap，当然，你也可以直接将原始图片放到一个Image对象里丢给canvas去绘制，不过我们已经拿到了buffer数据，创建ImageBitmap，canvas处理起来会快一点。

注意我们使用了一个Blob对象，Blob也是浏览器处理二进制数据的一个API。

👉🏻 Blob 是 Binary Large Object 的缩写，Blob 对象表示一个不可变、原始数据的类文件对象。

其实Blob也很有用，我们可以通过它来处理图片，比如压缩图片，关于这部分内容，留待下一次讨论。

Sjhb commented 3 years ago

赞一个