truera/trulens

Feedback functions returing None : API langchain request failed 4 time(s)

Closed this issue · 9 comments

We are using langchain and open ai ( no access to embeddings ) .
We have sucessfully implemented RAG chains in this environment. However , facing issues when integrating with truelens.

I am following the code base here :
https://www.trulens.org/trulens_eval/langchain_quickstart/#explore-in-a-dashboard

and even though I am the RAG execution is sucessful and results are fetched

with tru_recorder as recording:
llm_response = rag_chain.invoke(query)

display(llm_response)

Feedback functions are returning None :

display(feedback.name, feedback_result.result)

'groundedness_measure_with_cot_reasons'
None
'qs_relevance'
None
'relevance'
None

Originally posted by @arkacisco in #833 (comment)

On further debugging , here is the final error from the records :

API langchain request failed 4 time(s)..

Please note we have an AzureChatOpenAI endpoint and used it in the chain argument of the langchain provider and defined the feedback callbacks with that provider .

llm = AzureChatOpenAI(deployment_name="gpt-35-turbo",
base_url = 'https://chat-ai.cisco.com/openai',
api_key=token_response.json()["access_token"],
api_version="2023-08-01-preview",
model_kwargs=dict(
user=f'{{"appkey": "{app_key}"}}'
)
)
from trulens_eval.feedback.provider.langchain import Langchain

langchain_provider = Langchain(chain = llm)

from trulens_eval.feedback import Groundedness
grounded = Groundedness(groundedness_provider=langchain_provider)

Define a groundedness feedback function

f_groundedness = (
Feedback(grounded.groundedness_measure_with_cot_reasons)
.on(context.collect()) # collect context chunks into a list
.on_output()
.aggregate(grounded.grounded_statements_aggregator)
)

Here is the complete error :

FeedbackResult(feedback_result_id='feedback_result_hash_78971a0a21dd7ff9f4284477eb48f957', record_id='record_hash_f2af4562df7046022be19265ed889b11', feedback_definition_id='feedback_definition_hash_aa08be1911c9532808078c267ce61b3d', last_ts=datetime.datetime(2024, 2, 1, 19, 16, 12, 821320), status=<FeedbackResultStatus.FAILED: 'failed'>, cost=Cost(n_requests=0, n_successful_requests=0, n_classes=0, n_tokens=0, n_stream_chunks=0, n_prompt_tokens=0, n_completion_tokens=0, cost=0.0), name='relevance', calls=[], result=None, error='Traceback (most recent call last):\n File "/mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 542, in run\n result_and_meta, part_cost = sync(\n File "/mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/utils/asynchro.py", line 114, in sync\n return loop.run_until_complete(awaitable)\n File "/mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/nest_asyncio.py", line 98, in run_until_complete\n return f.result()\n File "/opt/conda/lib/python3.10/asyncio/futures.py", line 201, in result\n raise self._exception.with_traceback(self._exception_tb)\n File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 234, in __step\n result = coro.throw(exc)\n File "/mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/provider/endpoint/base.py", line 436, in atrack_all_costs_tally\n result, cbs = await Endpoint.atrack_all_costs(\n File "/mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/provider/endpoint/base.py", line 421, in atrack_all_costs\n return await Endpoint._atrack_costs(thunk, with_endpoints=endpoints)\n File "/mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/provider/endpoint/base.py", line 510, in _atrack_costs\n result: T = await desync(thunk)\n File "/mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/utils/asynchro.py", line 82, in desync\n res = await asyncio.to_thread(func, *args, **kwargs)\n File "/opt/conda/lib/python3.10/asyncio/threads.py", line 25, in to_thread\n return await loop.run_in_executor(None, func_call)\n File "/opt/conda/lib/python3.10/asyncio/futures.py", line 285, in await\n yield self # This tells Task to wait for completion.\n File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup\n future.result()\n File "/opt/conda/lib/python3.10/asyncio/futures.py", line 201, in result\n raise self._exception.with_traceback(self._exception_tb)\n File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run\n result = self.fn(*self.args, **self.kwargs)\n File "/mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/utils/python.py", line 248, in _future_target_wrapper\n return func(*args, **kwargs)\n File "/mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 544, in \n lambda: self.imp(**ins)\n File "/mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/provider/base.py", line 286, in relevance\n self.endpoint.run_me(\n File "/mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/provider/endpoint/base.py", line 272, in run_me\n raise RuntimeError(\nRuntimeError: API langchain request failed 4 time(s).\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 548, in run\n raise RuntimeError(\nRuntimeError: Evaluation of relevance failed on inputs: \n{'prompt': '',\n 'response': 'To overcome the bug< redacted>, you can fix it by u\nAPI langchain request failed 4 time(s)..\n', multi_result=None))

This issue was closed by mistake. reopened.

Hey @arkacisco - can you try testing your langchain provider on its own?

You can use the following:

langchain_provider.relevance('What's the capital of Japan?','Tokyo')

Please ignore my previous comment , its not working .. here are the details :

Code executed :

from trulens_eval.feedback.provider.langchain import Langchain
from langchain.llms import OpenAI, AzureOpenAI

gpt3_llm = AzureOpenAI(model="gpt-35-turbo", base_url = 'https://chat-ai.cisco.com',
api_key=token_response.json()["access_token"],
api_version="2023-08-01-preview",
model_kwargs=dict(
user=f'{{"appkey": "{app_key}"}}'
)
)
langchain_provider = Langchain(chain = gpt3_llm)

llm.predict("tell me a joke")

"Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"

langchain_provider.relevance('What's the capital of Japan?','Tokyo',)

Error Log :

langchain request failed <class 'openai.AuthenticationError'>=Error code: 401 - {'fault': {'faultstring': 'The Token has expired: policy(JWT-validateToken)', 'detail': {'errorcode': 'steps.jwt.TokenExpired'}}}. Retries remaining=3.
langchain request failed <class 'openai.AuthenticationError'>=Error code: 401 - {'fault': {'faultstring': 'The Token has expired: policy(JWT-validateToken)', 'detail': {'errorcode': 'steps.jwt.TokenExpired'}}}. Retries remaining=2.
langchain request failed <class 'openai.AuthenticationError'>=Error code: 401 - {'fault': {'faultstring': 'The Token has expired: policy(JWT-validateToken)', 'detail': {'errorcode': 'steps.jwt.TokenExpired'}}}. Retries remaining=1.
langchain request failed <class 'openai.AuthenticationError'>=Error code: 401 - {'fault': {'faultstring': 'The Token has expired: policy(JWT-validateToken)', 'detail': {'errorcode': 'steps.jwt.TokenExpired'}}}. Retries remaining=0.

RuntimeError Traceback (most recent call last)
Cell In[171], line 1
----> 1 langchain_provider.relevance('What's the capital of Japan?','Tokyo',)

File /mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/provider/base.py:286, in LLMProvider.relevance(self, prompt, response)
253 def relevance(self, prompt: str, response: str) -> float:
254 """
255 Uses chat completion model. A function that completes a
256 template to check the relevance of the response to a prompt.
(...)
283 "relevant".
284 """
285 return re_0_10_rating(
--> 286 self.endpoint.run_me(
287 lambda: self._create_chat_completion(
288 prompt=str.format(
289 prompts.PR_RELEVANCE, prompt=prompt, response=response
290 )
291 )
292 )
293 ) / 10.0

File /mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/provider/endpoint/base.py:272, in Endpoint.run_me(self, thunk)
269 sleep(retry_delay)
270 retry_delay *= 2
--> 272 raise RuntimeError(
273 f"API {self.name} request failed {self.retries+1} time(s)."
274 )

RuntimeError: API langchain request failed 4 time(s).

Does the LLM you have here work outside of the TruLens Langchain provider?

Yes @joshreini1 , it does . We normally use one of the the following options to connect and use the the LLM endpoint :

-- Azure Chat OpenAI :
from langchain.chat_models import AzureChatOpenAI
app_key =
llm = AzureChatOpenAI(deployment_name="gpt-35-turbo",
base_url = 'https://chat-ai.cisco.com/openai',
api_key=token_response.json()["access_token"],
api_version="2023-08-01-preview",
model_kwargs=dict(
user=f'{{"appkey": "{app_key}"}}'
)
)

llm.predict("Tell me a joke")

"Why don't scientists trust atoms?\n\nBecause they make up everything!"

------------AzureOpenAI---------------

from openai import AzureOpenAI
azure_openai_llm = AzureOpenAI(#deployment_name="gpt-35-turbo",
azure_endpoint = 'https://chat-ai.cisco.com',
api_key=token_response.json()["access_token"],
api_version="2023-08-01-preview",

                )

from openai.types.chat import ChatCompletionMessageParam
message_with_history: list[ChatCompletionMessageParam] = [
{"role": "system", "content": "You are a chatbot"},
{
"role": "user",
"content": "Tell me a joke",
},
]
response = azure_openai_llm.chat.completions.create(
model="gpt-35-turbo",
messages=message_with_history,
temperature=0,
# stop=["<|im_end|>"],
user='{"appkey": }',
max_tokens=1000,
# stream=True,
)
print("Reponse:", response)
for chunk in response:
print("Chunk:", chunk)

Reponse: ChatCompletion(id='chatcmpl-8pLUvV3agEQuGTFymuKMWqNWr3fb9', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Sure, here's a joke for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!", role='assistant', function_call=None, tool_calls=None), content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})], created=1707248057, model='gpt-35-turbo', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=22, prompt_tokens=20, total_tokens=42), prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], prompt_sent_to_chatgpt=[{'role': 'system', 'content': 'You are a chatbot'}, {'role': 'user', 'content': 'Tell me a joke'}], user='{"appkey": "redacted", "session_id": "1a709f11-ae05-4ed4-98d6-6c2662663169", "user": "", "prompt_truncate": "yes"}')
Chunk: ('id', 'chatcmpl-8pLUvV3agEQuGTFymuKMWqNWr3fb9')
Chunk: ('choices', [Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Sure, here's a joke for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!", role='assistant', function_call=None, tool_calls=None), content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})])
Chunk: ('created', 1707248057)
Chunk: ('model', 'gpt-35-turbo')
Chunk: ('object', 'chat.completion')
Chunk: ('system_fingerprint', None)
Chunk: ('usage', CompletionUsage(completion_tokens=22, prompt_tokens=20, total_tokens=42))
Chunk: ('prompt_filter_results', [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}])
Chunk: ('prompt_sent_to_chatgpt', [{'role': 'system', 'content': 'You are a chatbot'}, {'role': 'user', 'content': 'Tell me a joke'}])
Chunk: ('user', '{"appkey": "redacted", "session_id": "1a709f11-ae05-4ed4-98d6-6c2662663169", "user": "", "prompt_truncate": "yes"}')

Can you use our AzureOpenAI provider directly? https://www.trulens.org/trulens_eval/api/azureopenai_provider/

Here's a usage example notebook: https://github.com/truera/trulens/blob/main/trulens_eval/examples/expositional/models/azure_openai.ipynb

Using the AzureOpenAI provider directly is more robust and more widely used feedback provider compared to the Langchain provider.

I tried this before using langchain provider and it did not work .

First of all it does not take the app_key as an argument :

from trulens_eval.feedback.provider import OpenAI, AzureOpenAI

tru_openai = AzureOpenAI(deployment_name="gpt-35-turbo",
base_url = 'https://chat-ai.cisco.com',
api_key=token_response.json()["access_token"],
api_version="2023-08-01-preview",
model_kwargs=dict(
user=f'{{"appkey": "{app_key}"}}'
)
)


TypeError Traceback (most recent call last)
Cell In[176], line 3
1 # Initialize provider class
2 # openai = OpenAI()
----> 3 tru_openai = AzureOpenAI(deployment_name="gpt-35-turbo",
4 base_url = 'https://chat-ai.cisco.com/',
5 api_key=token_response.json()["access_token"],
6 api_version="2023-08-01-preview",
7 model_kwargs=dict(
8 user=f'{{"appkey": "{app_key}"}}'
9 )
10 )

File /mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/provider/openai.py:429, in AzureOpenAI.init(self, deployment_name, endpoint, **kwargs)
425 else:
426 # but include in provider args
427 kwargs['model_engine'] = deployment_name
--> 429 kwargs["client"] = OpenAIClient(client=oai.AzureOpenAI(**client_kwargs))
431 super().init(
432 endpoint=None, **kwargs
433 )

TypeError: AzureOpenAI.init() got an unexpected keyword argument 'model_kwargs'

If I use it without the app_key :

Initialize provider class

tru_openai = AzureOpenAI(deployment_name="gpt-35-turbo",
base_url = 'https://chat-ai.cisco.com',
api_key=token_response.json()["access_token"],
api_version="2023-08-01-preview",

               )

I get the same errror :

tru_openai.relevance('What's the capital of Japan?','Tokyo',)

openai request failed <class 'openai.AuthenticationError'>=Error code: 401 - {'fault': {'faultstring': 'The Token has expired: policy(JWT-validateToken)', 'detail': {'errorcode': 'steps.jwt.TokenExpired'}}}. Retries remaining=3.
openai request failed <class 'openai.AuthenticationError'>=Error code: 401 - {'fault': {'faultstring': 'The Token has expired: policy(JWT-validateToken)', 'detail': {'errorcode': 'steps.jwt.TokenExpired'}}}. Retries remaining=2.
openai request failed <class 'openai.AuthenticationError'>=Error code: 401 - {'fault': {'faultstring': 'The Token has expired: policy(JWT-validateToken)', 'detail': {'errorcode': 'steps.jwt.TokenExpired'}}}. Retries remaining=1.
openai request failed <class 'openai.AuthenticationError'>=Error code: 401 - {'fault': {'faultstring': 'The Token has expired: policy(JWT-validateToken)', 'detail': {'errorcode': 'steps.jwt.TokenExpired'}}}. Retries remaining=0.

RuntimeError Traceback (most recent call last)
Cell In[187], line 1
----> 1 tru_openai.relevance('What's the capital of Japan?','Tokyo',)

File /mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/provider/base.py:286, in LLMProvider.relevance(self, prompt, response)
253 def relevance(self, prompt: str, response: str) -> float:
254 """
255 Uses chat completion model. A function that completes a
256 template to check the relevance of the response to a prompt.
(...)
283 "relevant".
284 """
285 return re_0_10_rating(
--> 286 self.endpoint.run_me(
287 lambda: self._create_chat_completion(
288 prompt=str.format(
289 prompts.PR_RELEVANCE, prompt=prompt, response=response
290 )
291 )
292 )
293 ) / 10.0

File /mnt/nfs/shared/arkghosh/LLM/BCS KB/notebooks/venv/lib/python3.10/site-packages/trulens_eval/feedback/provider/endpoint/base.py:272, in Endpoint.run_me(self, thunk)
269 sleep(retry_delay)
270 retry_delay *= 2
--> 272 raise RuntimeError(
273 f"API {self.name} request failed {self.retries+1} time(s)."
274 )

RuntimeError: API openai request failed 4 time(s).

Working on a fix for this @arkacisco

Thanks !