[BUG] TruChain - no values returned for Groundedness feedback function for Bedrock
mahesh-hv-appmod opened this issue · 3 comments
Bug Description
TruEval returns an empty array for Groundedness values
To Reproduce
from langchain.chains import LLMChain
from langchain.schema import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
retriever = vectorstore.as_retriever()
retrieval_chain = (
{"context": retriever, "query": RunnablePassthrough()}
| prompt_template
| llm
| StrOutputParser()
)
retrieval_chain.invoke(question)
from trulens_eval.app import App
from trulens_eval.feedback import Groundedness
from trulens_eval import TruChain, Feedback, Tru, Select, Bedrock
tru = Tru()
tru.reset_database()
context = App.select_context(retrieval_chain)
Initialize model for feedback function collection class:
bedrock = Bedrock(model_id = "amazon.titan-tg1-large", region_name="us-east-1")
Define a groundedness feedback function
grounded = Groundedness(groundedness_provider=bedrock)
f_groundedness = (
Feedback(grounded.groundedness_measure_with_cot_reasons, name="Groundedness")
.on(context.collect()) # collect context chunks into a list
.on_output()
.aggregate(grounded.grounded_statements_aggregator)
)
f_relevance = Feedback(bedrock.relevance, name="Relevance").on_input_output()
tru_recorder = TruChain(retrieval_chain,
app_id='Base LLM',
selectors_check_warning=True,
feedbacks=[f_groundedness, f_relevance])
questions = [
"What is Acme Co.?",
"Who are the competetiors?",
"What’s the financial profile?"
]
with tru_recorder as recording:
for question in questions:
llm_response = retrieval_chain.invoke(question)
Expected behavior
expect to see some values for the Groundedness metric. Relevance metrics are being returned
Environment:
Python 3.10.13
langchain: 0.1.16
trulens_eval: 0.27.0
Additional context
Add any other context about the problem here.
The issue with the Groundedness feedback function returning an empty array could stem from several factors. Here's a structured approach to diagnose and resolve the problem:
-
Check Provider Compatibility: Ensure the
groundedness_provider
passed to theGroundedness
class is compatible with the method you're using (groundedness_measure_with_cot_reasons
orgroundedness_measure_with_nli
). Each method expects a specific type of provider (LLM or Huggingface), and using the wrong type can lead to errors. -
Verify NLTK 'punkt' Tokenizer Setup: The
Groundedness
class relies on NLTK's 'punkt' tokenizer. If there's an issue with its download or usage, it could affect the function. Make sure the tokenizer is correctly installed and accessible. -
Input Data Integrity: The methods tokenize the input statement into sentences. Incorrectly formatted input or empty statements will result in no sentences for analysis, thus no groundedness scores. Ensure your input data is correctly formatted and non-empty.
-
Review Groundedness Class Instantiation and Feedback Function Implementation: Misconfiguration or errors in the feedback function's implementation could lead to unexpected outcomes. Double-check the instantiation of the
Groundedness
class and the implementation of your feedback function for any discrepancies. -
Examine Groundedness Score Calculation Logic: Errors in calculating or interpreting groundedness scores could also be the culprit. Review the logic within the
groundedness_measure_with_cot_reasons
andgroundedness_measure_with_nli
methods to ensure scores are correctly calculated and aggregated.
Regarding the .on(context.collect())
and .aggregate(grounded_statements_aggregator)
methods, the former processes input data based on the specified context, while the latter aggregates results into a single score. If the input is empty or lacks associated scores, the aggregation method might attempt to compute the mean of an empty array, which typically leads to an error rather than an empty array. This suggests that the issue might also involve how input data is collected and processed before aggregation.
By following these steps, you should be able to identify and resolve the issue with the Groundedness feedback function. If you encounter any specific errors or need further assistance, please provide more details for targeted support.
Details
Sources
- trulens_eval/trulens_eval/tests/groundedness_benchmark.ipynb
- trulens_eval/trulens_eval/feedback/groundedness.py
- trulens_eval/trulens_eval/feedback/init.py
- trulens_eval/generated_files/all_tools.py
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
To continue the conversation, mention @dosu.
Hi! I have exactly the same issue, both with context relevance as well as groundedness. I am using hugging face as the provider.
Any update on that?