crucible: A Python repository from noah-art3mis

Crucible

Lightweight prompt evaluation package.

Use online here. Can also be used locally through streamlit. Can use ollama to run LLMs locally if necessary.

Cost estimation is very rough (input * 2).

Instructions

Set the models, prompts and variables
Set grading style and temperature
- "EXACT": is either right or wrong. ignores line breaks and spaces in answer
- "QUALITATIVE": ask gpt4o for feedback. be mindful of this token usage
Click compile. Check the price estimation. Click run.
Results are shown segmented by category.

Parameters

Model
- id (str): name as understood by ollama. you might need to download it first
- source (str): "local" or "openai" or "anthropic"
```
Model("llama3", "local")
```

Prompt

id (str): name of the test case
slot (str): name of theslot which will be substituted by the variable in the prompt
content (str): actual prompt

Prompt(
    id="test_3",
    slot="{variable}",
    content="""Sua tarefa é analisar e responder se o texto a seguir menciona a necessidade de comprar remédios ou itens de saúde. Aqui está o texto:\n\n###\n\n{variable}\n\n###\n\n\nPrimeiro, analise cuidadosamente o texto em um rascunho. Depois, responda: a solicitação citada menciona a necessidade de comprar remédios ou itens de saúde? Responda "<<SIM>>" ou "<<NÃO>>".""",
)

Variable

id (str): name of the test case
content (str): text of snippet to be inserted in prompt
expected (str list): values that would be considered correct
options (str list): all values that the response could take. leave empty if does not apply

Variable(
    id="despesas_essenciais",
    content="Família monoparental composta por Josefa e 5 filhos com idades entre 1 e 17 anos. Contam apenas com a renda de coleta de material reciclável e relatam dificuldade para manter as despesas essenciais. Solicita-se, portanto, o auxílio vulnerabilidade.",
    expected=["<<NAO>>", "<<NÃO>>"],
),

TODO

add tests
add instructions

noah-art3mis/crucible

Crucible

Instructions

Parameters

TODO

Resources