replicate/replicate-python

Llama 2 outputs disjointed text

michalkubizna opened this issue · 5 comments

mattt commented

Hi @michalkubizna. Can you please share some more information to help figure out what's happening? Which model / version of LLaMA are you using. Can you share the code that produced this? What's the ID of the prediction on Replicate?

same here. here is the sample code:
import replicate

api_token = "my token"
client = replicate.Client(api_token=api_token)

while True:
prompt = input("Enter your prompt (or 'exit' to quit): ")
if prompt == "exit":
break

output = client.run(
    "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
    input={"prompt": prompt}
)
for item in output:
print(item)
mattt commented

@monshizadeh @michalkubizna LLaMA and other LLMs generate models by token. You can get a better idea about how words split into tokens in this interactive demo. 1

The output of running LLaMA is an iterator or list of tokens. The reason why you're getting the results you are is that the Python print statement appends a newline by default. You can disable that by passing end="":

for token in output:
    print(token, end="")

Footnotes

  1. LLaMA uses a different tokenizer, but this demo illustrates the general concept.

The problem is that it divides the individual words. For instance (refer to the attached photo), instead of printing "factually", it prints "fact" and "ually" separately. Similarly, for "Coherent", it displays "coh", "er", and "ent" as distinct segments. Please check the attached photo for reference.

mattt commented

@michalkubizna Yes, this is expected. LLaMA outputs by token, not word. It's hard to tell from a screenshot of text rather than text itself, but notice that "ually" doesn't have a leading space, like " fact", so if you were join with an empty string or print without a newline, everything would work as expected.