niieani/gpt-tokenizer

Huge memory consumsation of isWithinTokenLimit

Opened this issue · 4 comments

I am experiencing 200MB increase after implementing gpt-tokenizer the only function that I am using from this library is isWithinTokenLimit. Here is an image of my memory consumtion before and after deployment.
Here is how I am using is

function getRequestTokenCount(req: ChatCompletionRequestMessage[]) {
  const extraTokensDueToPromptForEachMessage = 7
  return req.reduce((acc, curr) => {
    const tokensInText = isWithinTokenLimit(curr.content, Infinity) || 99999
    return acc + tokensInText + extraTokensDueToPromptForEachMessage
  }, 0)
}

image

same here, what seems even more weird, is that it made me run out of memory by just importing it, not even using it

hi @aminsol and @olboghgc, can you provide a little more context about the platform?
is this under node, bun, the browser?
are you using a bundler of any sort?
which GPT encoding are you trying use?

@niieani Node 18/typescript/babel 7.0.0 in a firebase cloud function, with gpt 3.5 and 4o(but that shouldn't really matter since a simple import already cause a spike in memory usage)

btw I was able to +/- ignore this issue by doing a conditional require so I don't always waste memory when I don't need to call it