replicate/replicate-python

!! Serious Bug!! Llama-2-70B base internally calls llama-2-13B

JeremyAlain opened this issue · 2 comments

I've been using the llama-2 base and chat series on LLama for the last two weeks. There is a serious bug somewhere in replicate.

When calling llama-2-70B (the base model not the chat model), replicate somehow internally calls llama-2-13B. So users believe they are using llama-2-70B and are ALSO PAYING the price for the better model, but get the worse model!!

We've evaluated both llama-2-13B and llama-2-70B on MMlU tasks and their performance is exactly the same. You can also reproduce this by going to the replicate playground, making a query to llama-2-70B and then look at your past predictions, you will see that they call llama-2-13B!

This is an extremely serious issue!!! Why is this happening? People are expecting a 70B performance, and are paying for it, but get a worse deal!.

Hi, @JeremyAlain. Thank you so much for bringing this to our attention. You're absolutely correct that we were mistakenly serving the base 13b model from the base 70b endpoint. I updated the deployment earlier today to serve the correct model.

I apologize for any inconvenience this error caused you. I'm drafting an email now that we'll send out to affected customers this week with information about refunding the billed usage.

Hi @mattt Thank you so much for fixing it. Really appreciate it :) . Yeah don't worry it can happen, glad I can use it again now.