stanfordnlp/dspy

Signatures rarely work out of the box

Closed this issue · 2 comments

I've been getting started with DSPy and have struggled with even the most basic of signatures, even the examples in the docs often fail across LLMs with the default prompt often returning extra, unwanted responses in the output.

I wrote a small script that tests some of the more common LLMs and here's an example of the output:


Testing on gpt_4
LLM Test: ✅ Looking good

Testing on claude_35
LLM Test: ❌ LLM did not respond as expected: Expected "Positive", got

Here's the sentiment analysis for the given sentence:

Sentence: it's a charming and often affecting journey.
Sentiment: Positive

The sentence expresses a favorable opinion about a journey, describing it as "charming" and "affecting," which are both positive descriptors. This indicates an overall positive sentiment towards the subject.

Testing on llama70b
LLM Test: ✅ Looking good

Testing on llama8b
LLM Test: ❌ LLM did not respond as expected: Expected "Positive", got

The sentiment of the sentence is positive.

Testing on claude_3_haiku
LLM Test: ❌ LLM did not respond as expected: Expected "Positive", got

Sentence: it's a charming and often affecting journey.
Sentiment: Positive

Here's the script:

# -------------------------
# Setup Environment
# %pip install dspy-ai==2.4.9 boto3

# Check creds have been properly set
import os
def obsfucate(var): 
    return f"{var[:4]}{'*' * (len(var) - 4)}"

def check_env_vars():
    aws_vars = ['OPENAI_API_KEY', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'AWS_SESSION_TOKEN']
    for var in aws_vars:
        value = os.environ.get(var)
        if value:
            print(f"{var} is set. Value: {obsfucate(value)}")
        else:
            print(f"{var} is not set.")

check_env_vars()

# Check the version of dspy running
from importlib.metadata import version
version("dspy-ai")



# -------------------------
# Define LLMs
import dspy

aws_provider_ue1 = dspy.Bedrock(region_name="us-east-1")
aws_provider_uw2 = dspy.Bedrock(region_name="us-west-2")

llms = {
   "gpt_4": dspy.OpenAI(
      model='gpt-4',
      max_tokens=1000,
      api_key=os.environ.get('OPENAI_API_KEY')
    ),
    "claude_35": dspy.AWSAnthropic(
        aws_provider=aws_provider_ue1,
        model="anthropic.claude-3-5-sonnet-20240620-v1:0",
    ),
    "llama70b":  dspy.AWSMeta(
        aws_provider=aws_provider_uw2,
        model="meta.llama3-70b-instruct-v1:0",
        max_tokens=1000,
    ),
    "llama8b":  dspy.AWSMeta(
        aws_provider=aws_provider_uw2,
        model="meta.llama3-8b-instruct-v1:0",
        max_tokens=1000,
    ),
    "claude_3_haiku": dspy.AWSAnthropic(
        aws_provider=aws_provider_uw2,
        model="anthropic.claude-3-haiku-20240307-v1:0"
    )
}

def basic_test(lm):
    sentence = "it's a charming and often affecting journey."

    classify = dspy.Predict('sentence -> sentiment')
    sent = classify(sentence=sentence).sentiment

    if sent == "Positive":
        print('LLM Test: ✅ Looking good')
    else:
        print(f'LLM Test: ❌ LLM did not respond as expected: Expected "Positive", got \n```\n{sent}\n```')

for llm in llms:
    lm = llms[llm]
    dspy.settings.configure(lm=lm)

    print("-------------------------")
    print("Testing on", llm)
    basic_test(lm)

For better results with chat LLMs, use the new experimental prompts released in v2.4.11 dspy.configure(experimental=True)

Yup, chat LMs have better support with dspy.configure(experimental=True).