mshumer/gpt-prompt-engineer

Is it possible to reduce the computational complexity?

HK-SHAO opened this issue · 4 comments

The current complexity is a bit high: $O(m\cdot n^2)$
And only one optimal one can be selected from $n$ prompts

total_rounds = len(test_cases) * len(prompts) * (len(prompts) - 1) // 2

Is there a way to iterate prompt faster to get it to good shape?
Such as the use of randomization, generative adversarial or genetic algorithms?

This is a very interesting point. I was running it yesterday with 30 candidates and it took 4350 iterations which was like 12 hours of computation or something with like 15k calls to the API

This is a very interesting point. I was running it yesterday with 30 candidates and it took 4350 iterations which was like 12 hours of computation or something with like 15k calls to the API

If it's GPT-4, it consumes too much money and electricity :(

Nice idea of the automatic prompt engineer!

What you can do to reduce cost: You don't have to check every prompt against every prompt. Just like chess, not all players play against each other all the time. You already know who's going to win when the best player competes against the worst.

For each test case:

  1. Sort all prompts on their rating. (Initially, thats just a random order as they all have rating 1200.)
  2. Compare the first with the second; the third with the fourth, and so on.
  3. Update the ratings of the prompts.

That will reduce an experiment with 30 candidates and 10 test cases from 4350 iterations to 150. Also, can you reduce the cost by only running get_score once instead of twice?

What you furthermore can do is discard the worst candidates after some point, as you're not interested in the complete order anyways, but only the best. You will have to tune a bit how many you can discard at what point so that you still get good results.

I'm curious why you chose the ELO based competition. My understanding is that ELO is meant to help identify the relative skill of different players so you can say Annie is 31% more skilled than John (or prompt A is x% better than prompt b). Compared with a tournament style competition where you essentially end up with a ranking but don't know the relative skill of first vs second place, etc.

Given that the goal here is to find the best prompt, why not use a single elimination tournament. This would significantly decrease the computational complexity and therefore allow way more prompts to be tested.

Personally I love the concept of an ELO based ranking system for the prompts and it would likely show you interesting details about the relative goodness of the prompts, but I'm not sure it's worth the computational cost.