`meta/meta-llama-3-70b` ignores `max_tokens`
Opened this issue · 0 comments
johny-b commented
I'm pretty sure I'm sending max_tokens
and:
- I get much more tokens
- I also don't see this
max_tokens
when looking at my prediction in the browser
When I use exactly the same code for e.g. meta/llama-2-70b
this does not happen, i.e. I really get the requested number of tokens.