microsoft/aici

pytctrl backtracking is non-idempotent

matthai opened this issue · 2 comments

The below program backtracks and prints the identical token-string 6 times from the same label, followed by generating text using gen_text().

I expect it to generate the same text every time, but it does not. The responses are often quite different from each other.

The log from a run is attached log.log.

I ran the script using aici run aici_sample.py

Distinct runs seem to give the same sequence of outputs from the gen_text(). So maybe this doesn't have to do with a random value sneaking in per run.

This run was on mixtral.

import pyaici.server as aici

RULE_NAME = 'C123'

prompt_pfx=f'''
You are classifying whether an incoming message matches the {RULE_NAME} rule definition below.

Given a message description of the form "message: <message content>", please print "response: Matches {RULE_NAME}" if the rule matches the message and "response: Does not match {RULE_NAME}" otherwise. On the next line, print "reason: <reason for the response>". Finally, on the next line, print "Message End".

* {RULE_NAME} rule definition:

Message is addressed at a  group (not just an individual) ("Group Targeting")

'''

messages =[
    'he deserves that',
    'he deserves that',
    'he deserves that',
    'he deserves that',
    'he deserves that',
    'he deserves that',
]


# aici pyctrl instructions
# https://github.com/microsoft/aici/tree/main/controllers/pyctrl
async def main():
    await aici.FixedTokens(prompt_pfx)
    plabel = aici.Label()

    aici.set_var(f'prompt', prompt_pfx)

    # generate a response for the same message N times from the same label
    for msgidx, message in enumerate(messages):
        aici.set_var(f'message_{msgidx}', message)
        await aici.FixedTokens(f'message: {message}', following=plabel)
        await aici.gen_text(
            stop_at='Message End', store_var=f'res_{msgidx}')


result = aici.start(main())

It looks like it's only a problem with the llama.cpp backend (when using orca deployment it seems deterministic). Need to look into llama.cpp sampling I guess...

It looks like it's only a problem with the llama.cpp backend (when using orca deployment it seems deterministic). Need to look into llama.cpp sampling I guess...

Good to know that orca works. Given that llama is so widely used, it will be good to take care of llama as well, I guess.