replicate/replicate-python

`meta/meta-llama-3-70b` ignores `max_tokens`

Opened this issue · 0 comments

I'm pretty sure I'm sending max_tokens and:

  • I get much more tokens
  • I also don't see this max_tokens when looking at my prediction in the browser

When I use exactly the same code for e.g. meta/llama-2-70b this does not happen, i.e. I really get the requested number of tokens.