oven-sh/bun

Add TextDecoderStream and TextEncoderStream

jimmywarting opened this issue Β· 40 comments

TextDecoderStream and TextEncoderStream are missing

const reader = response.body.pipeThrough(new TextDecoderStream()).getReader();
                                                     ^
ReferenceError: Can't find variable: TextDecoderStream

bun version 1.0.6+969da088f5db3258a803ec186012e30f992829b4

SukkaW commented

Workaround: copy the following ponyfill into a .ts file:

// Copyright 2016 Google Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

// Polyfill for TextEncoderStream and TextDecoderStream

// Modified by Sukka (https://skk.moe) to increase compatibility and performance with Bun.

export class PolyfillTextDecoderStream extends TransformStream<Uint8Array, string> {
  readonly encoding: string;
  readonly fatal: boolean;
  readonly ignoreBOM: boolean;

  constructor(
    encoding: Encoding = 'utf-8',
    {
      fatal = false,
      ignoreBOM = false,
    }: ConstructorParameters<typeof TextDecoder>[1] = {},
  ) {
    const decoder = new TextDecoder(encoding, { fatal, ignoreBOM });
    super({
      transform(chunk: Uint8Array, controller: TransformStreamDefaultController<string>) {
        const decoded = decoder.decode(chunk, { stream: true });
        if (decoded.length > 0) {
          controller.enqueue(decoded);
        }
      },
      flush(controller: TransformStreamDefaultController<string>) {
        // If {fatal: false} is in options (the default), then the final call to
        // decode() can produce extra output (usually the unicode replacement
        // character 0xFFFD). When fatal is true, this call is just used for its
        // side-effect of throwing a TypeError exception if the input is
        // incomplete.
        const output = decoder.decode();
        if (output.length > 0) {
          controller.enqueue(output);
        }
      }
    });

    this.encoding = encoding;
    this.fatal = fatal;
    this.ignoreBOM = ignoreBOM;
  }
}

Then import { PolyfillTextDecoderStream } from 'path/to/where/you/save/the/polyfill.ts'.

Bummer. Can't use @google/generative-ai due to this.

It's not possible to use Vercel AI SDK because of this issue

Here are polyfills you can use based off the Node.js' implementation while you waiting for Bun to catch up:

/**
 * TextEncoderStream polyfill based on Node.js' implementation https://github.com/nodejs/node/blob/3f3226c8e363a5f06c1e6a37abd59b6b8c1923f1/lib/internal/webstreams/encoding.js#L38-L119 (MIT License)
 */
export class TextEncoderStream {
  #pendingHighSurrogate: string | null = null

  #handle = new TextEncoder()

  #transform = new TransformStream<string, Uint8Array>({
    transform: (chunk, controller) => {
      // https://encoding.spec.whatwg.org/#encode-and-enqueue-a-chunk
      chunk = String(chunk)

      let finalChunk = ""
      for (let i = 0; i < chunk.length; i++) {
        const item = chunk[i]
        const codeUnit = item.charCodeAt(0)
        if (this.#pendingHighSurrogate !== null) {
          const highSurrogate = this.#pendingHighSurrogate

          this.#pendingHighSurrogate = null
          if (0xdc00 <= codeUnit && codeUnit <= 0xdfff) {
            finalChunk += highSurrogate + item
            continue
          }

          finalChunk += "\uFFFD"
        }

        if (0xd800 <= codeUnit && codeUnit <= 0xdbff) {
          this.#pendingHighSurrogate = item
          continue
        }

        if (0xdc00 <= codeUnit && codeUnit <= 0xdfff) {
          finalChunk += "\uFFFD"
          continue
        }

        finalChunk += item
      }

      if (finalChunk) {
        controller.enqueue(this.#handle.encode(finalChunk))
      }
    },

    flush: (controller) => {
      // https://encoding.spec.whatwg.org/#encode-and-flush
      if (this.#pendingHighSurrogate !== null) {
        controller.enqueue(new Uint8Array([0xef, 0xbf, 0xbd]))
      }
    },
  });

  get encoding() {
    return this.#handle.encoding
  }

  get readable() {
    return this.#transform.readable
  }

  get writable() {
    return this.#transform.writable
  }

  get [Symbol.toStringTag]() {
    return 'TextEncoderStream'
  }
}

/**
 * TextDecoderStream polyfill based on Node.js' implementation https://github.com/nodejs/node/blob/3f3226c8e363a5f06c1e6a37abd59b6b8c1923f1/lib/internal/webstreams/encoding.js#L121-L200 (MIT License)
 */
export class TextDecoderStream {
  #handle: TextDecoder

  #transform = new TransformStream({
    transform: (chunk, controller) => {
      const value = this.#handle.decode(chunk, {stream: true})

      if (value) {
        controller.enqueue(value)
      }
    },
    flush: controller => {
      const value = this.#handle.decode()
      if (value) {
        controller.enqueue(value)
      }

      controller.terminate()
    }
  })

  constructor(encoding = "utf-8", options: TextDecoderOptions = {}) {
    this.#handle = new TextDecoder(encoding, options)
  }

  get encoding() {
    return this.#handle.encoding
  }

  get fatal() {
    return this.#handle.fatal
  }

  get ignoreBOM() {
    return this.#handle.ignoreBOM
  }

  get readable() {
    return this.#transform.readable
  }

  get writable() {
    return this.#transform.writable
  }

  get [Symbol.toStringTag]() {
    return "TextDecoderStream"
  }
}

Both basically just TS port I use in my projects with Bun.

I mean, you can use it in your app, can't you? It probably relies on globalThis, so:

// Add those polyfills to globalThis before you import `@google/generative-ai`
globalThis.TextEncoderStream ||= TextEncoderStream
globalThis.TextDecoderStream ||= TextDecoderStream

But for library this might not be a good option.

Here are polyfills you can use based off the Node.js' implementation while you waiting for Bun to catch up:

You can simplify the implementation by simply using class PolyfillTextDecoderStream extends TransformStream<Uint8Array, string> and super() (same for TextEncoderStream). See #5648 (comment)

If you read the spec, you'll see TextDecoderStream and TextEncoderStream are not subclasses of TransformStream.

If you read the spec, you'll see TextDecoderStream and TextEncoderStream are not subclasses of TransformStream.

IMO, the spec only defines what a correct implementation should look like, and it demonstrates that "Correct TextEncoderStream and TextDecoderStream could be implemented like these", but the runtime doesn't have to implement them in the exact same way. That's to say, the spec doesn't enforce what you must do to implement this.

IMHO, as long as the implementation exposes the required APIs and fields, and the behaviors are the same, then the implementation is spec-compliant.

Also, take a look at the spec:

image

image

The spec requires TextEncoderStream and TextDecoderStream to have all the fields from GenericTransformStream, so extends just works.

If you read the spec, you'll see TextDecoderStream and TextEncoderStream are not subclasses of TransformStream.

Also, my implementation is based on Google, Inc.'s work.

And it is not only Google does that, QuickJS also uses extends: https://github.com/rsenn/qjs-modules/blob/7e83ad1402fed2681ce189d4cfc2b55386a5bcc5/lib/streams.js#L211

Tried to use the polyfill proposed by @octet-stream above with TRPC client:

Getting this error:

error: Invalid response or stream interrupted
      at new StreamInterruptedError (/Users/username/Sandboxes/mytrpc/node_modules/@trpc/server/dist/unstable-core-do-not-import/stream/stream.mjs:233:9)
      at closeOrAbort (/Users/username/Sandboxes/mytrpc/node_modules/@trpc/server/dist/unstable-core-do-not-import/stream/stream.mjs:421:23)
      at promiseInvokeOrNoopMethodNoCatch (:1:21)
      at promiseInvokeOrNoopMethod (:1:21)
      at writableStreamDefaultControllerProcessClose (:1:21)
      at writableStreamDefaultControllerAdvanceQueueIfNeeded (:1:21)
      at writableStreamDefaultControllerClose (:1:21)
      at writableStreamClose (:1:21)

I was thinking it's just pipeThrough can't recognize TextDecoderStream polyfill as valid TransformStream-ish object, but it not seem to be the case, and in the minimal example (I did pipe array of Uint8Array encoded strings through ReadableStream my polyfill) it works. Can you share minimal repro so I can verify this problems? Maybe I figure out how this can be fixed.

Here's the code I've been using to verify my assumption:

// The polyfill itself
class TextDecoderStream {
  #handle: TextDecoder

  #transform = new TransformStream({
    transform: (chunk, controller) => {
      const value = this.#handle.decode(chunk, {stream: true})

      if (value) {
        controller.enqueue(value)
      }
    },
    flush: controller => {
      const value = this.#handle.decode()
      if (value) {
        controller.enqueue(value)
      }

      controller.terminate()
    }
  })

  constructor(encoding = "utf-8", options: TextDecoderOptions = {}) {
    this.#handle = new TextDecoder(encoding, options)
  }

  get encoding() {
    return this.#handle.encoding
  }

  get fatal() {
    return this.#handle.fatal
  }

  get ignoreBOM() {
    return this.#handle.ignoreBOM
  }

  get readable() {
    return this.#transform.readable
  }

  get writable() {
    return this.#transform.writable
  }

  get [Symbol.toStringTag]() {
    return "TextDecoderStream"
  }
}

const encoder = new TextEncoder()

// Source
const data = ["a", "b", "c"].map(chunk => encoder.encode(chunk)).values()

const readable = new ReadableStream({
  pull(controller) {
    const {done, value} = data.next()

    if (done) {
      return void controller.close()
    }

    controller.enqueue(value)
  }
})

const reader = readable.pipeThrough(new TextDecoderStream()).getReader()

while (true) {
  const {done, value} = await reader.read()

  if (done) {
    break
  }

  console.log(value)
}

@octet-stream That works perfectly as a shim for Vercel's AI SDK, thank you and great job.

I updated TextEncoderStream in my comment, because it was breaking encoding for some of the characters (emojis specifically).

I updated TextEncoderStream in my comment, because it was breaking encoding for some of the characters (emojis specifically).

@octet-stream Thank you! I had this exact issue and the current polyfills in your comment worked perfectly to fix them

@octet-stream That works perfectly as a shim for Vercel's AI SDK, thank you and great job.

@ctjlewis : Could you please share a code snippet? I'm not able to make it work with Vercel AI SDK (see below). Thanks!

    import { TextDecoderStream } from '../utils/bun-ponyfill'
    const result = await streamText({
      model: openai('gpt-3.5-turbo'),
      prompt: 'Invent a new holiday and describe its traditions.'
    })

    const textDecoderStream = new TextDecoderStream()

    const reader = result.textStream.pipeThrough(textDecoderStream).getReader()
    while (true) {
      const { done, value } = await reader.read()
      if (done) {
        break
      }
      process.stdout.write(value)
    }

But it crashes with:
ReferenceError: Can't find variable: TextDecoderStream and AI_APICallError: Failed to process successful response url: "https://api.openai.com/v1/chat/completions"

@charnould Super sorry. Override it for the global context, globalThis.TextDecoderStream.

shim.ts

/// <reference lib="dom" />

globalThis.TextDecoderStream = class {
  #handle: TextDecoder

  #transform = new TransformStream({
    transform: (chunk, controller) => {
      const value = this.#handle.decode(chunk, { stream: true })

      if (value) {
        controller.enqueue(value)
      }
    },
    flush: controller => {
      const value = this.#handle.decode()
      if (value) {
        controller.enqueue(value)
      }

      controller.terminate()
    }
  })

  constructor(encoding = "utf-8", options: TextDecoderOptions = {}) {
    this.#handle = new TextDecoder(encoding, options)
  }

  get encoding() {
    return this.#handle.encoding
  }

  get fatal() {
    return this.#handle.fatal
  }

  get ignoreBOM() {
    return this.#handle.ignoreBOM
  }

  get readable() {
    return this.#transform.readable
  }

  get writable() {
    return this.#transform.writable
  }

  get [Symbol.toStringTag]() {
    return "TextDecoderStream"
  }
}

In your program:

import "@/shim";

Then you can use all of the library, StreamingTextResponse etc. without issue. The library itself is trying to use globalThis.TextDecoderStream, so even though you shimmed it for your logic, code inside the ai package threw.

Yup, these are meant to use as polyfills (so you have to patch globalThis, unlike with ponyfills) if you have no control over those AI SDKs in order to be able to use them.

I also proposed these polyfills to web-streams-polyfill library, so it can be used as npm package, if anybody need's it. Hope they will add these.

Can we add the TS implementations to https://github.com/oven-sh/bun/blob/main/src/js/node/stream.web.ts#L16-L17 ?
They are not written in Zig, but still very useful for us.

Does anyone have a polyfill for a the encoder stream?

Yes, you can find one in my comment.

Can we add the TS implementations to https://github.com/oven-sh/bun/blob/main/src/js/node/stream.web.ts#L16-L17 ? They are not written in Zig, but still very useful for us.

@Jarred-Sumner, I DM'd you on Twitter about this. It's not as simple as just dropping the polyfill in there, since those expects headers to exist from WebCore. If there's a way to just polyfill it using JS, I couldn't figure out how to get it working with the debug build.

anyone knows how to get the polyfill work with Next.js inside a Bun Docker build? I've imported the polyfill patched to the globalThis but still got error: Attempt to export a nullable value for "TextDecoderStream" error.

@Jarred-Sumner what is the priority on this? Just to have an idea to when expect for this to be released.

@trpc/server 11.0.0-rc.361 onwards is broken on bun because of this. This makes bun unusable for trpc backends with react-query, since it relies on trpc 11+. Workaround is using version 11.0.0-rc.359

EDIT: Only with httpBatchStreamLink

@trpc/server 11.0.0-rc.361 onwards is broken on bun because of this. This makes bun unusable for trpc backends with react-query, since it relies on trpc 11+. Workaround is using version 11.0.0-rc.359

EDIT: Only with httpBatchStreamLink

You can use this polyfill to make it work comment

https://chng.it/n848nhZ89w

petition for adding the streams

The @google/generative-ai npm package also uses TextDecoderStream for its streaming generation. Right now it doesn’t work in Bun, producing the following output:

637 |  * GenerateContentResponse.
638 |  *
639 |  * @param response - Response from a fetch call
640 |  */
641 | function processStream(response) {
642 |     const inputStream = response.body.pipeThrough(new TextDecoderStream("utf8", { fatal: true }));
                                                            ^
ReferenceError: Can't find variable: TextDecoderStream
      at processStream (node_modules/@google/generative-ai/dist/index.mjs:642:55)

Bun v1.1.12 (macOS arm64)

(As a workaround I used tsx to run the script in Node.js instead of Bun for now.)

i signed the petition

if you had this issue with tests you need to create a setup.ts/js file

// polyfills here...


if (typeof globalThis.TextDecoderStream === 'undefined') {
  // @ts-ignore
  globalThis.TextDecoderStream = TextDecoderStream;
}

// Ensure the polyfill is applied
const ensureTextDecoderStream = () => {
  if (typeof globalThis.TextDecoderStream === 'undefined') {
    throw new Error('TextDecoderStream is not defined after polyfill');
  }
};

ensureTextDecoderStream();

export { };

bun.config.json

{
  "test": {
    "preload": [
      "./setup.ts"
    ],
    "include": [
      "src/**/*.test.ts"
    ]
  }
}

Jarred has confirmed that if the petition gets 100 signatures, they'll implement this.
https://x.com/jarredsumner/status/1818739728914722989

Please sign!!! Here is the link:
https://www.change.org/p/urge-jarred-sumner-to-implement-textencoderstream-and-textdecoderstream-in-bun

We reached 100 signatures πŸŽ‰

Will @Jarred-Sumner deliver?

Will @Jarred-Sumner deliver?

no, but @dylan-conway will :)

#13115

n2k3 commented

Thanks to @dylan-conway for implementing this!

Am I correct in that, once #13115 would be merged, bun can now be used for running the Next.js dev server (using this command: bun --bun run dev)? If so, that means that the callout in the bun guide for Next.js can be removed. If not, what are other Node APIs that Next.js relies on that bun is not (fully) supporting yet?

anyone knows how to get the polyfill work with Next.js inside a Bun Docker build? I've imported the polyfill patched to the globalThis but still got error: Attempt to export a nullable value for "TextDecoderStream" error.

I am having the same issue

For anyone wondering why reopened, see #13151

@dylan-conway @Jarred-Sumner I'm still seeing TextDecoderStream issues in my Next.js app, running via nixpacks created image. When I run my Next.js app outside of the container, everything is fine.

Nix even added bun 1.1.29 support so wondering if others have complained about this.
https://github.com/NixOS/nixpkgs/blob/8b085394e9121d66fcb31db141681d23b5490cc3/pkgs/development/web/bun/default.nix#L15

$ next start
  β–² Next.js 14.2.7
  - Local:        http://localhost:3000

 βœ“ Starting...
 βœ“ Ready in 472ms
# ...
{var __webpack_modules__=
# ...
# deleted logs for brevity but wanted to show webpack_modules reference.

error: Attempt to export a nullable value for "TextDecoderStream"
      at defineProperties (/app/node_modules/next/dist/compiled/edge-runtime/index.js:1:711500)
      at addPrimitives (/app/node_modules/next/dist/compiled/edge-runtime/index.js:1:710245)
      at extend (/app/node_modules/next/dist/compiled/edge-runtime/index.js:1:705028)
      at new VM (/app/node_modules/next/dist/compiled/edge-runtime/index.js:1:712369)
      at new EdgeVM (/app/node_modules/next/dist/compiled/edge-runtime/index.js:1:704958)
      at /app/node_modules/next/dist/server/web/sandbox/context.js:223:21

Same issue as @farezv. Did you find a fix or is there another issue here tracking this?