What are "test_cases"?

Question

What are "test_cases"?

KB-g opened this issue a year ago · 6 comments

Hi there

I think this tool could be very useful! Thx for building it.
However, when starting to use it, I did not fully understand what test_cases are and how exactly they relate to the task for which I want to create the optimal prompt.

Could someone elaborate on what to enter here?

To make it a bit more explicit, here are two examples for which I would love to try out this tool:

I would like to find the optimal prompt to generate a catchy headline for a text-based social media app called Threads
I would like to find the optimal prompt to summarize a document such that it retains a lot of the details of the original document and does not turn it into something very generic.

Thanks a lot for the help!

Answer 1 · 2023-07-09T06:32:59.000Z

能给录个屏吗？

Answer 2 · 2023-07-09T10:19:38.000Z

Hi there

I think this tool could be very useful! Thx for building it. However, when starting to use it, I did not fully understand what test_cases are and how exactly they relate to the task for which I want to create the optimal prompt.

Could someone elaborate on what to enter here?

To make it a bit more explicit, here are two examples for which I would love to try out this tool:

I would like to find the optimal prompt to generate a catchy headline for a text-based social media app called Threads

I would like to find the optimal prompt to summarize a document such that it retains a lot of the details of the original document and does not turn it into something very generic.

Thanks a lot for the help!

I think it is the example of user input.

Answer 3 · 2023-07-11T11:41:56.000Z

I was also hopping this would work for non-categorization tasks, but I believe the way it is built it needs the answer to be in one of several categories so that they can be automatically validated. So those tasks (catchy headline and summary) would not work. I hope the author can confirm this.

Some options moving forward I think, validate/score answers manually for non-classification tasks. Use chatgpt as self supervisor. Some other way I cant imagine.

Answer 4 · 2023-07-18T12:58:00.000Z

Hi there

I think this tool could be very useful! Thx for building it. However, when starting to use it, I did not fully understand what test_cases are and how exactly they relate to the task for which I want to create the optimal prompt.

Could someone elaborate on what to enter here?

To make it a bit more explicit, here are two examples for which I would love to try out this tool:

I would like to find the optimal prompt to generate a catchy headline for a text-based social media app called Threads

I would like to find the optimal prompt to summarize a document such that it retains a lot of the details of the original document and does not turn it into something very generic.

Thanks a lot for the help!

I'm not 100% sure but I assumed that the test cases provide examples that the use case then gets as input.
Say: if my instruction is to generate a compelling headline for a text-based app then the test cases would be a few examples of text-based apps with intriguing headlines.
At least that's how I understood it and it also coincides with the example in the repo link.

Answer 5 · 2023-07-18T13:02:33.000Z

I was also hopping this would work for non-categorization tasks, but I believe the way it is built it needs the answer to be in one of several categories so that they can be automatically validated. So those tasks (catchy headline and summary) would not work. I hope the author can confirm this.

Some options moving forward I think, validate/score answers manually for non-classification tasks. Use chatgpt as self supervisor. Some other way I cant imagine.

Dude, did you even try this? The example is literally what you are claiming it would not work?!
description = "Given a prompt, generate a landing page headline."

Answer 6 · 2023-07-20T09:58:39.000Z

Dude, did you even try this? The example is literally what you are claiming it would not work?!

Interesting, I must have missed the other example. Tried it and worked fine.