niieani/gpt-tokenizer

encodeChatGenerator undefined?

Closed this issue ยท 16 comments

encodeChat error TypeError: Cannot read properties of undefined (reading 'encodeChatGenerator')
at encodeChat
node_modules/.pnpm/gpt-tokenizer@2.1.1/node_modules/gpt-tokenizer/cjs/GptEncoding.js:141:25

Also facing this problem.
Maybe the author forgot to pack some external modules?

The error occurs when I execute the following code:

import { encodeChat } from "gpt-tokenizer";

const messages = [ /* Valid chat */ ];

console.log(encodeChat(messages, "gpt-3.5-turbo"));

Here's the full error:

Notice that I'm using WSL so the file directory might be a little bit weird.

file:///mnt/d/Documents/.project/Script/GPT-Tokenizer/node_modules/.pnpm/gpt-tokenizer@2.1.1/node_modules/gpt-tokenizer/esm/GptEncoding.js:138
        return [...this.encodeChatGenerator(chat, model)].flat();
                        ^

TypeError: Cannot read property 'encodeChatGenerator' of undefined
    at encodeChat (file:///mnt/d/Documents/.project/Script/GPT-Tokenizer/node_modules/.pnpm/gpt-tokenizer@2.1.1/node_modules/gpt-tokenizer/esm/GptEncoding.js:138:25)
    at file:///mnt/d/Documents/.project/Script/GPT-Tokenizer/index.js:83:13
    at ModuleJob.run (internal/modules/esm/module_job.js:183:25)
    at async Loader.import (internal/modules/esm/loader.js:178:24)
    at async Object.loadESM (internal/process/esm_loader.js:68:5)
    at async handleMainPromise (internal/modules/run_main.js:59:12)

I'm also facing the same issue.

I have the same issue.

Same issue.

wgot commented

@zz98 This problem is related to the way you're importing and using the encodeChat method from the gpt-tokenizer module.
When you use the following import statement:

import { encodeChat } from 'gpt-tokenizer'

You're importing encodeChat as a detached function from the original object (the module). Therefore, you won't have access to the context or properties of the original object.

However, when you use the import statement as follows:

import tokenizer from 'gpt-tokenizer'
tokenizer.encodeChat([{ ... }])

You're importing the entire original object. This allows you to access the properties and methods of the original object.
The error message you're seeing:

node_modules/gpt-tokenizer/src/GptEncoding.ts:246
    return [...this.encodeChatGenerator(chat, model)].flat()

suggests that this is undefined or not what's expected in this.encodeChatGenerator(chat, model). In JavaScript, when a method is called detached from its object, this becomes undefined (in strict mode). Hence, calling the detached encodeChat method results in an error.

However, when you call encodeChat in the form of tokenizer.encodeChat, the encodeChat method is bound to the tokenizer object. Therefore, this refers to the tokenizer object, and no error occurs.

To resolve this issue, you should import the whole tokenizer and call encodeChat as a method of tokenizer. This way, this will correctly refer to tokenizer, and the error should not occur. (this answer written and translate by gpt-4.)

@wgot Thanks for the explanation! I just had the same issue.

However, it seems the underlying problem is that class functions really shouldn't be exported like that. Even the author itself has mistaken the behavior of their own code, because the usage example in the main README.md itself would be incorrect, since it's using the unbound functions:

import {
  encode,
  encodeChat,
  decode,
  isWithinTokenLimit,
  encodeGenerator,
  decodeGenerator,
  decodeAsyncGenerator,
} from 'gpt-tokenizer'

Same issue. Documentation needs to be updated as well.

lox commented

Still doesn't work:

const tokenizer = require('gpt-tokenizer')
const chatTokens = tokenizer.encodeChat(chat, 'gpt-3.5-turbo')

I get this error:

TypeError: Cannot read properties of undefined (reading 'get')
    at Object.encodeChatGenerator (xxx/node_modules/gpt-tokenizer/cjs/GptEncoding.js:99:57)
    at encodeChatGenerator.next (<anonymous>)
    at Object.encodeChat (xxx/node_modules/gpt-tokenizer/cjs/GptEncoding.js:141:25)
    at cleanup (xxx/scripts/cleanupTranscript.js:40:32)
    at Object.<anonymous> (xxx/scripts/cleanupTranscript.js:62:1)
    at Module._compile (node:internal/modules/cjs/loader:1196:14)

Same issue
This code works in esm but doesn't work in cjs

const tokenizer = require('gpt-tokenizer')
const chatTokens = tokenizer.encodeChat(chat, 'gpt-3.5-turbo')

this.specialTokenMapping is undefined when the encodeChatGenerator function is called in cjs.

GptEncoding this => { default: [Getter], decode: [Getter], decodeAsyncGenerator: [Getter], decodeGenerator: [Getter], encode: [Getter], encodeChat: [Getter], encodeChatGenerator: [Getter], encodeGenerator: [Getter], isWithinTokenLimit: [Getter], EndOfText: [Getter], FimPrefix: [Getter], FimMiddle: [Getter], FimSuffix: [Getter], ImStart: [Getter], ImEnd: [Getter], ImSep: [Getter], EndOfPrompt: [Getter] } this.specialTokenMapping=> undefined

TypeError: Cannot read properties of undefined (reading 'get') at Object.encodeChatGenerator (xxx/node_modules/gpt-tokenizer/cjs/GptEncoding.js:110:57) at encodeChatGenerator.next (<anonymous>) at Object.encodeChat (xxx/node_modules/gpt-tokenizer/cjs/GptEncoding.js:153:25)

Same issue This code works in esm but doesn't work in cjs

const tokenizer = require('gpt-tokenizer') const chatTokens = tokenizer.encodeChat(chat, 'gpt-3.5-turbo')

this.specialTokenMapping is undefined when the encodeChatGenerator function is called in cjs.

GptEncoding this => { default: [Getter], decode: [Getter], decodeAsyncGenerator: [Getter], decodeGenerator: [Getter], encode: [Getter], encodeChat: [Getter], encodeChatGenerator: [Getter], encodeGenerator: [Getter], isWithinTokenLimit: [Getter], EndOfText: [Getter], FimPrefix: [Getter], FimMiddle: [Getter], FimSuffix: [Getter], ImStart: [Getter], ImEnd: [Getter], ImSep: [Getter], EndOfPrompt: [Getter] } this.specialTokenMapping=> undefined

TypeError: Cannot read properties of undefined (reading 'get') at Object.encodeChatGenerator (xxx/node_modules/gpt-tokenizer/cjs/GptEncoding.js:110:57) at encodeChatGenerator.next (<anonymous>) at Object.encodeChat (xxx/node_modules/gpt-tokenizer/cjs/GptEncoding.js:153:25)

Solved like this for cjs
const tokenizer = require('gpt-tokenizer').default;

Same issue, return [...this.encodeChatGenerator(chat, model)].flat();
TypeError: Cannot read properties of undefined (reading 'encodeChatGenerator')

Same issue This code works in esm but doesn't work in cjs
const tokenizer = require('gpt-tokenizer') const chatTokens = tokenizer.encodeChat(chat, 'gpt-3.5-turbo')
this.specialTokenMapping is undefined when the encodeChatGenerator function is called in cjs.
GptEncoding this => { default: [Getter], decode: [Getter], decodeAsyncGenerator: [Getter], decodeGenerator: [Getter], encode: [Getter], encodeChat: [Getter], encodeChatGenerator: [Getter], encodeGenerator: [Getter], isWithinTokenLimit: [Getter], EndOfText: [Getter], FimPrefix: [Getter], FimMiddle: [Getter], FimSuffix: [Getter], ImStart: [Getter], ImEnd: [Getter], ImSep: [Getter], EndOfPrompt: [Getter] } this.specialTokenMapping=> undefined
TypeError: Cannot read properties of undefined (reading 'get') at Object.encodeChatGenerator (xxx/node_modules/gpt-tokenizer/cjs/GptEncoding.js:110:57) at encodeChatGenerator.next (<anonymous>) at Object.encodeChat (xxx/node_modules/gpt-tokenizer/cjs/GptEncoding.js:153:25)

Solved like this for cjs const tokenizer = require('gpt-tokenizer').default;

this fixed my issue! Thanks!

seyfer commented

@wgot Thanks for the explanation! I just had the same issue.

However, it seems the underlying problem is that class functions really shouldn't be exported like that. Even the author itself has mistaken the behavior of their own code, because the usage example in the main README.md itself would be incorrect, since it's using the unbound functions:

import {
  encode,
  encodeChat,
  decode,
  isWithinTokenLimit,
  encodeGenerator,
  decodeGenerator,
  decodeAsyncGenerator,
} from 'gpt-tokenizer'

@niieani this is a valid point, please adjust the documentation for encodeChat usage.

Apologies folks, the documentation was co-written by ChatGPT and I've missed this when doing a manual review. ๐Ÿ˜„
I'll make a fix soon.

Wasn't a documentation issue after all, just forgot to bind the functions. Should be fixed in next version.

๐ŸŽ‰ This issue has been resolved in version 2.1.2 ๐ŸŽ‰

The release is available on:

Your semantic-release bot ๐Ÿ“ฆ๐Ÿš€