[BUG] TruLens throws ValidationError: Input should be an instance of llama_index.core.base_query_engine.BaseQueryEngine

Question

[BUG] TruLens throws ValidationError: Input should be an instance of llama_index.core.base_query_engine.BaseQueryEngine

lucifermorningstar1305 opened this issue 3 months ago · 5 comments

lucifermorningstar1305 commented 3 months ago

Bug Description

I am trying to use TruLens for a custom LLamaIndex code that I am working on, and I am getting the following error:

ValidationError: 2 validation errors for TruLlama
app.`is-instance[llama_index.core.base_query_engine.BaseQueryEngine]`
  Input should be an instance of llama_index.core.base_query_engine.BaseQueryEngine [type=is_instance_of, input_value=<llama_index.core.query_e...bject at 0x7b3036edfa70>, input_type=RetrieverQueryEngine]
    For further information visit https://errors.pydantic.dev/2.6/v/is_instance_of
app.`is-instance[llama_index.chat_engine.types.BaseChatEngine]`
  Input should be an instance of llama_index.chat_engine.types.BaseChatEngine [type=is_instance_of, input_value=<llama_index.core.query_e...bject at 0x7b3036edfa70>, input_type=RetrieverQueryEngine]
    For further information visit https://errors.pydantic.dev/2.6/v/is_instance_of

To Reproduce

from typing import Any, Callable

import numpy as np
import pandas as pd
import nest_asyncio
import os
import warnings

# Qdrant
from qdrant_client import QdrantClient

# LLamaIndex Imports
from llama_index.core.settings import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex, Document, StorageContext, SimpleDirectoryReader, ServiceContext, load_index_from_storage
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.llms.gemini import Gemini
from llama_index.llms.ollama import Ollama

# Langchain Imports
from langchain_community.llms import Ollama as LangOllama
from langchain_google_genai import ChatGoogleGenerativeAI

# TrueLens Imports
from trulens_eval.feedback.provider.langchain import Langchain
from trulens_eval import Feedback, TruLlama, Tru
from trulens_eval.feedback import Groundedness

# General
from pathlib import Path
from collections import defaultdict


documents = SimpleDirectoryReader(input_files=["../data/attn.txt"]).load_data()
document = Document(text="\n\n".join([doc.text for doc in documents]))
client = QdrantClient(host="localhost")
collection_name = "test"

def build_naive_index(document: Document, persist_dir: str="./indices/myIndices", llm: str="ollama") -> VectorStoreIndex:
    """Builds a Naive RAG index for a given document based on the LLMs provided"""

    Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
    if llm == "ollama":
        Settings.llm = Ollama(model="llama2", request_timeout=60.0)

    vector_store = QdrantVectorStore(collection_name=collection_name, client=client)
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    if not os.path.exists(persist_dir):
        index = VectorStoreIndex.from_documents(documents=[document], storage_context=storage_context, show_progress=True)
        index.storage_context.persist(persist_dir=persist_dir)
    else:
        index = load_index_from_storage(StorageContext.from_defaults(vector_store=vector_store, persist_dir=persist_dir))

    return index

naive_index = build_naive_index(document=document, persist_dir="../indices/1706.03762", llm="ollama")
query_engine = naive_index.as_query_engine()

def get_prebuilt_trulens_recorder(query_engine: Callable, llm:str="ollama", app_id:str="App1") -> Callable:
    """Creates a trulens recorder based on different metrics for evaluating an LLM"""
    provider = None
    if llm == "ollama":
        ollama = LangOllama(model="llama2")
        provider = Langchain(chain=ollama)

    # Answer Relevance
    ans_relevance = Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance").on_input_output()

    # Context Relevance
    ctx_selection = TruLlama.select_source_nodes().node.text
    ctx_relevance = Feedback(provider.qs_relevance_with_cot_reasons, name="Context Relevance").on_input().on(ctx_selection).aggregate(np.mean)

    # Groundness Relevance
    grounded = Groundedness(groundedness_provider=provider)
    gr_relevance = Feedback(grounded.groundedness_measure_with_cot_reasons, name="Groudedness").on(ctx_selection).on_output().aggregate(grounded.grounded_statements_aggregator)

    tru_recorder = TruLlama(query_engine, 
                            app_id=app_id, 
                            feedbacks=[ans_relevance, ctx_relevance, gr_relevance])
    return tru_recorder
    
tru = Tru()
tru.reset_database()

tru_recorder = get_prebuilt_trulens_recorder(query_engine, app_id="Naive Rag Engine")

Expected behavior
There should not be any validation error.

Environment:

OS: Linux (Ubuntu)
Python Version: 3.12
TruLens version: 0.27.0
Versions of other relevant installed libraries: llama-index-core==0.10.21.post1 ollama==0.1.7

Answer 1 · 2024-03-24T14:20:16.000Z

Hey @lucifermorningstar1305! Great to see you back with us. Hope you're doing well!

Based on the error message you provided, it seems that the query_engine object you're passing to the TruLlama constructor is not an instance of llama_index.core.base_query_engine.BaseQueryEngine or llama_index.chat_engine.types.BaseChatEngine, which is what TruLlama expects.

In your code, you're creating the query_engine object with the following line:

query_engine = naive_index.as_query_engine()

The as_query_engine() method should return an object that is an instance of BaseQueryEngine or a subclass thereof. If it doesn't, you'll encounter the ValidationError you're seeing.

To resolve this issue, you should ensure that the as_query_engine() method of your VectorStoreIndex class (or whatever class naive_index is an instance of) returns an object that is an instance of BaseQueryEngine or correctly inherits from it.

Here's the relevant part of the TruLlama class definition for reference:

class TruLlama(App):
    ...
    app: Union[BaseQueryEngine, BaseChatEngine]
    ...
    def __init__(
        self, app: Union[BaseQueryEngine, BaseChatEngine], **kwargs: dict
    ):
        ...

As you can see, the app parameter (which corresponds to your query_engine object) is expected to be an instance of BaseQueryEngine or BaseChatEngine.

Unfortunately, without the actual implementation details of your VectorStoreIndex class and its as_query_engine method, it's hard to provide a more specific solution. However, the error message suggests a type mismatch between what TruLlama expects and what is being passed to it.

You can find more information about the TruLlama class in the TruLens codebase.

I hope this helps! If you have any more questions or if something is unclear, please let me know.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Answer 2 · 2024-03-27T03:00:40.000Z

try to update llama-index version

Answer 3 · 2024-04-01T16:20:52.000Z

@lucifermorningstar1305 Did you try the update?

Answer 4 · 2024-04-25T15:08:50.000Z

I have the same error, and the query_engine type is :
<class 'llama_index.core.query_engine.retriever_query_engine.RetrieverQueryEngine'>

And version info:
llama-index-core==0.10.30
trulens-eval==0.28.1

When i update to llama-index-core-0.10.31, it worked now...

Answer 5 · 2024-05-07T15:59:09.000Z

Thanks @metaimagine - seems this was an issue with the llama-index version that was patched on their end. Closing this issue