This project's goal is to allow fast decompression of large half-float arrays to Float32Array
in JavaScript. Since half-precision floating point arrays are not natively supported in JavaScript, decoding float16 data, and doing it fast, is challenging. CPU-based solutions are slow, so we utilize WebGPU to process values in parallel on the GPU.
The following input types are supported: Uint8Array
, Uint16Array
, Uint32Array
Passing any other type will result in a ErrorReason.UNSUPPORTED_TYPE
error.
The input data has to be 2-byte aligned, otherwise an ErrorReason.UNALIGNED_INPUT
will be raised.
Internally, we view the input data as Uint32Array
, so we perform 4-byte alignment when needed. By doing this, we can decode two half-float values in a single kernel invocation. (the lower, and upper 16 bits of the input u32
value are decoded to f32
)
import { f16tof32GPU } from "f16-to-f32-gpu";
try {
const f32FromUint8 = await f16tof32GPU(new Uint8Array([0x00, 0xC0]));
const f32FromUint16 = await f16tof32GPU(new Uint16Array([0xC000]));
const f32FromUint32 = await f16tof32GPU(new Uint32Array([0xC0000000]));
console.log(f32FromUint8[0]); // -2
console.log(f32FromUint16[0]); // -2
console.log(f32FromUint32[0]); // -2
} catch (error) {
console.log(`Error: ${error.cause}, ${error.message}`)
}
- tinygrad Stable Diffusion WebGPU port: try it here. Since
f16
support is limited in browsers, the compute in Stable Diffusion WebGPU is inf32
. However, thef32
weights of the model used in the demo exceed 4 gigabytes, which is too much data to download and then cache in the browser. To optimize for weight download speed, the demo fetches the weights inf16
, and all the >2 Gigabytes of data are decompressed client-side usingf16tof32GPU
. The decompressedf32
buffer is then used by the inference WebGPU kernels.
MIT