AutoLLM - Automatic Inference Parameter Tuning and example selection for LLM

Inference parameter tuning

This Research shows inference parameter is critical for the LLM generative capability. Providing a search space for inferencing and let AutoLLM finds the most suitable parameter.

Example selection

Fewshot can help LLM generate better response in most of situation. And AutoLLM can determine the most suitable examples combination for you.

Using Example

var context = new MLContext();
var autoLLM = context.Auto().CreateAutoLLM();
var examples = new[]{
    new
    {
        problem = "what answer is 1 + 1",
        answer = "2",
        reason = "1 + 1 is 2",
    },
    new
    {
        problem = "If $2^8=4^x$, what is the value of $x$?",
        answer = "4",
        reason = "Rewrite $4$ as $2^2$ to find $4^x=2^{2x}$.  Since $2^8=2^{2x}$, we have 2x=8$ which implies $x=\\boxed{4}$",
    },
    // other examples
};

// temperature: [0.3, 2]
// N: [3, 100]
// ... other search space
GPTSearchSpace searchSpace;

// train validation dataset
IDataView train, validation;

autoLLM.SetTrialRunner((context, examples, option, train, validation) =>{
        var pipeline = context.Transform.CreateFewshotPromptTemplate(
            promptTemplate: @"
            Solve the question carefully, Simplify your answer as much as possible.
            ${Example}

            question: ${problem}
            response(in json):",
            examplePromptTemplate: @"
            ### Example ###
            question: ${problem}
            response(in json):
            {
                ""answer"": ${answer},
                ""reason"": ${reason}
            }",
            outputColumnName: "prompt",
            exampleVariableName: "Example",
            examples)
            .Append(context.Transforms.GPT3_5(
                inputColumnName: "prompt",
                outputColumnName: "response",
                temperature: option.Temperature,
                N: option.N,
                // other options
                apiKey: Environment.GetVariable("api-key"),
            ));
        var model = pipeline.Fit(train);
        var eval = model.Transform(validation);
        // calculate score

        return new TrialResult{
            metric = score,
            loss = -score,
            model = model,
            parameter = option,
            examples = examples,
        };
    )
}, examples, searchSpace);

var bestModel = autoLLM.Fit(train, validation);

// use model
var input = new {
    problem = "what's 2 + 3",
};

var output = bestModel.Transform(context.Data.LoadFromEnumerable(input))
var response = output.GetColumn<string>("response").First();

response
/*
{
    "answer": "5",
    "reason": "because 2 + 3 = 5"
}
*/

How to run

Case study - Math

Prerequisit

  • access to GPT3.5 on Azure OpenAI service

Steps

  • Clone this project
  • Set up the key and endpoint in MathExperiment.cs. This will be used to make LLM calls.
  • Set up the difficult level of math problems you want to run. The support levels are level 1, level 2, level 3, level 4 and level 5
  • Start MathExperiment In Program.cs by
await MathExperiment.RunAsync()
  • Run this project

Case study

Math

Use LLM to resolve math problems. Input is problem and output is answer inside \box{}. Evaluate metric is accuracy. Code is here

NOTE: this study is enlighten by this original post: LLM-tuning-math

NOTE: the accuracy here is just for reference as we we use GPT3.5 to evaluate if the answer is correct and we notice that sometimes GPT3.5 mistakenly mark wrong answer as correct. So accuracy might be lower than what the table present.

Settings

  • LLM Model: GPT3.5 turbo
  • number of example candidates: 6
  • example selector: random
  • number of train dataset: 20
  • number of validation dataset: 10
  • number of test dataset: 200

Result

Level oneshot + default inference parameter oneshot + inference parameter tuning all examples + default inference parameter random example selector+ inference tuning random example selector + default inference parameter
2 0.885 0.94 0.985 0.985 0.885
3 0.78 0.9 0.935 0.77 0.78
4 0.685 0.77 0.94 0.93 0.7
5 0.465 0.495 0.91 0.94 0.455

Reference