Mintplex-Labs/anything-llm

[BUG]: Token should not be exceeded (Azure OpenAI, GPT-4o, Chat Model Token Limit 128,000) but llm ist generating response forever

Closed this issue · 1 comments

How are you running AnythingLLM?

Docker (local)

What happened?

We do use Whisper (Whisper-WebUI) for transcription services. We do not use the integrated transcribe feature because users cannot upload mp3, wav, mp4 and the integrated transcribing cannot be configured for Condition On Previous Text disabled and Repetition Penalty 2 or 3.

We are facing the following issue with AnythingLLM, Docker, Azure OpenAI, GPT-4o, Chat Model Token Limit 128,000:

If we use the chat window for long transcriptions (text) and post them directly into the chat, the chat window stays at generating response forever. No error whatsoever. We can stop generating response, yes. But we should not exceed the token limit of 128,000.

Even if we split into multiple parts same thing.

We tried to remove all CR-LF/LF same thing.

Some examples:
Full: Characters 89219 words 16336 sentences 81 paragraphs 7240 Spaces 9096
Split Part 1: Characters 15604 words 2743 sentences 21 paragraphs 886 Spaces 1858

Does anyone have a clue what's going on there?

Are there known steps to reproduce?

Paste long text directly into chat window and send to llm.

Are you saying the LLM never returns a response when pasting in long text? This is likely because your are hitting your OpenAI Azure rate limits, which have a pretty low tokens/sec rate and you probably need to raise them. Its not weird for more tokens to result in a longer time to first token, but it should return something in a reasonable time.