truera/trulens

[BUG] TruChain - no values returned for Groundedness feedback function for Bedrock

mahesh-hv-appmod opened this issue · 3 comments

Bug Description
TruEval returns an empty array for Groundedness values

To Reproduce
from langchain.chains import LLMChain
from langchain.schema import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

retriever = vectorstore.as_retriever()

retrieval_chain = (
{"context": retriever, "query": RunnablePassthrough()}
| prompt_template
| llm
| StrOutputParser()
)

retrieval_chain.invoke(question)

from trulens_eval.app import App
from trulens_eval.feedback import Groundedness
from trulens_eval import TruChain, Feedback, Tru, Select, Bedrock

tru = Tru()
tru.reset_database()

context = App.select_context(retrieval_chain)

Initialize model for feedback function collection class:

bedrock = Bedrock(model_id = "amazon.titan-tg1-large", region_name="us-east-1")

Define a groundedness feedback function

grounded = Groundedness(groundedness_provider=bedrock)

f_groundedness = (
Feedback(grounded.groundedness_measure_with_cot_reasons, name="Groundedness")
.on(context.collect()) # collect context chunks into a list
.on_output()
.aggregate(grounded.grounded_statements_aggregator)
)

f_relevance = Feedback(bedrock.relevance, name="Relevance").on_input_output()

tru_recorder = TruChain(retrieval_chain,
app_id='Base LLM',
selectors_check_warning=True,
feedbacks=[f_groundedness, f_relevance])

questions = [
"What is Acme Co.?",
"Who are the competetiors?",
"What’s the financial profile?"
]

with tru_recorder as recording:
for question in questions:
llm_response = retrieval_chain.invoke(question)

Expected behavior
expect to see some values for the Groundedness metric. Relevance metrics are being returned

Environment:
Python 3.10.13
langchain: 0.1.16
trulens_eval: 0.27.0

Additional context
Add any other context about the problem here.

The issue with the Groundedness feedback function returning an empty array could stem from several factors. Here's a structured approach to diagnose and resolve the problem:

  1. Check Provider Compatibility: Ensure the groundedness_provider passed to the Groundedness class is compatible with the method you're using (groundedness_measure_with_cot_reasons or groundedness_measure_with_nli). Each method expects a specific type of provider (LLM or Huggingface), and using the wrong type can lead to errors.

  2. Verify NLTK 'punkt' Tokenizer Setup: The Groundedness class relies on NLTK's 'punkt' tokenizer. If there's an issue with its download or usage, it could affect the function. Make sure the tokenizer is correctly installed and accessible.

  3. Input Data Integrity: The methods tokenize the input statement into sentences. Incorrectly formatted input or empty statements will result in no sentences for analysis, thus no groundedness scores. Ensure your input data is correctly formatted and non-empty.

  4. Review Groundedness Class Instantiation and Feedback Function Implementation: Misconfiguration or errors in the feedback function's implementation could lead to unexpected outcomes. Double-check the instantiation of the Groundedness class and the implementation of your feedback function for any discrepancies.

  5. Examine Groundedness Score Calculation Logic: Errors in calculating or interpreting groundedness scores could also be the culprit. Review the logic within the groundedness_measure_with_cot_reasons and groundedness_measure_with_nli methods to ensure scores are correctly calculated and aggregated.

Regarding the .on(context.collect()) and .aggregate(grounded_statements_aggregator) methods, the former processes input data based on the specified context, while the latter aggregates results into a single score. If the input is empty or lacks associated scores, the aggregation method might attempt to compute the mean of an empty array, which typically leads to an error rather than an empty array. This suggests that the issue might also involve how input data is collected and processed before aggregation.

By following these steps, you should be able to identify and resolve the issue with the Groundedness feedback function. If you encounter any specific errors or need further assistance, please provide more details for targeted support.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

small edit

the calls to the groundedness functions pass in empty arrays '[]'. see below:

groundedness

Hi! I have exactly the same issue, both with context relevance as well as groundedness. I am using hugging face as the provider.
Any update on that?