dqbd/tiktoken

'TextEncoder' is not defined - MongoDB Functions

georgeherby opened this issue · 1 comments

Sadly due to MongoDB limitations see here TextEncoder and TextDecoder.

This is the error I get

{"message":"'TextEncoder' is not defined","name":"ReferenceError"}

Not sure if there is a workaround e.g. passing in an override or if using https://www.npmjs.com/package/util would help?

I tried to bodge overriding global, but Mongo stop you doing that...

if (typeof TextEncoder === "undefined") {
  global.TextEncoder = class TextEncoder {
    encode(str) {
      const utf8 = unescape(encodeURIComponent(str));
      const result = new Uint8Array(utf8.length);
      for (let i = 0; i < utf8.length; i++) {
        result[i] = utf8.charCodeAt(i);
      }
      return result;
    }
  };
}

So I am at an impasse and can't use the library in this instance

This is a code snippet (can't share the whole function)

import { encodingForModel } from 'js-tiktoken';

function getTokenToken(content, model) {
  const enc = encodingForModel(model);
  const tokens = enc.encode(content);
  return tokens.length;
}

Just to say i have forked and published a version where it works with the utils dep.

No idea if its compatible with all scenarios, I cap raise an MR if of interest

https://github.com/georgeherby/tiktoken-mongodb