/crucible

Develop better LLM apps by testing different models and prompts in bulk.

Primary LanguagePythonMIT LicenseMIT

Crucible

Lightweight prompt evaluation package.

Use online here. Can also be used locally through streamlit. Can use ollama to run LLMs locally if necessary.

Cost estimation is very rough (input * 2).

Instructions

  1. Set the models, prompts and variables
  2. Set grading style and temperature
    • "EXACT": is either right or wrong. ignores line breaks and spaces in answer
    • "QUALITATIVE": ask gpt4o for feedback. be mindful of this token usage
  3. Click compile. Check the price estimation. Click run.
  4. Results are shown segmented by category.

Parameters

  • Model

    • id (str): name as understood by ollama. you might need to download it first
    • source (str): "local" or "openai" or "anthropic"
    Model("llama3", "local")
  • Prompt

    • id (str): name of the test case
    • slot (str): name of theslot which will be substituted by the variable in the prompt
    • content (str): actual prompt
    Prompt(
        id="test_3",
        slot="{variable}",
        content="""Sua tarefa é analisar e responder se o texto a seguir menciona a necessidade de comprar remédios ou itens de saúde. Aqui está o texto:\n\n###\n\n{variable}\n\n###\n\n\nPrimeiro, analise cuidadosamente o texto em um rascunho. Depois, responda: a solicitação citada menciona a necessidade de comprar remédios ou itens de saúde? Responda "<<SIM>>" ou "<<NÃO>>".""",
    )
  • Variable

    • id (str): name of the test case
    • content (str): text of snippet to be inserted in prompt
    • expected (str list): values that would be considered correct
    • options (str list): all values that the response could take. leave empty if does not apply
    Variable(
        id="despesas_essenciais",
        content="Família monoparental composta por Josefa e 5 filhos com idades entre 1 e 17 anos. Contam apenas com a renda de coleta de material reciclável e relatam dificuldade para manter as despesas essenciais. Solicita-se, portanto, o auxílio vulnerabilidade.",
        expected=["<<NAO>>", "<<NÃO>>"],
    ),

TODO

  • add tests
  • add instructions

Resources