ValueError: Error performing search in AstraDBVectorStore: 'content'
Closed this issue · 5 comments
Bug Description
Hello,
When trying to setup the RAG template with AstraDB, we get the following:
ValueError: Error performing search in AstraDBVectorStore: 'content'
It looks like it's trying to get a key that doesn't exist.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Python311\Scripts\langflow.exe\__main__.py", line 7, in <module>
File "C:\Python311\Lib\site-packages\langflow\__main__.py", line 528, in main
app()
-> <typer.main.Typer object at 0x000002684A448DD0>
File "C:\Python311\Lib\site-packages\typer\main.py", line 321, in __call__
return get_command(self)(*args, **kwargs)
| | | -> {}
| | -> ()
| -> <typer.main.Typer object at 0x000002684A448DD0>
-> <function get_command at 0x000002682A6B8680>
File "C:\Python311\Lib\site-packages\click\core.py", line 1157, in __call__
return self.main(*args, **kwargs)
| | | -> {}
| | -> ()
| -> <function TyperGroup.main at 0x000002682A69E480>
-> <TyperGroup >
File "C:\Python311\Lib\site-packages\typer\core.py", line 728, in main
return _main(
-> <function _main at 0x000002682A69D620>
File "C:\Python311\Lib\site-packages\typer\core.py", line 197, in _main
rv = self.invoke(ctx)
| | -> <click.core.Context object at 0x000002684A408F90>
| -> <function MultiCommand.invoke at 0x00000268270756C0>
-> <TyperGroup >
File "C:\Python311\Lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
| | | | -> <click.core.Context object
at 0x000002684A965F50>
| | | -> <function Command.invoke at
0x0000026827075080>
| | -> <TyperCommand run>
| -> <click.core.Context object at 0x000002684A965F50>
-> <function MultiCommand.invoke.<locals>._process_result at
0x000002684ABD6160>
File "C:\Python311\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
| | | | | -> {'host': '127.0.0.1', 'workers':
1, 'timeout': 300, 'port': 7860, 'components_path':
WindowsPath('C:/Python311/Lib/site-packa...
| | | | -> <click.core.Context object at
0x000002684A965F50>
| | | -> <function run at 0x000002684ABD5E40>
| | -> <TyperCommand run>
| -> <function Context.invoke at 0x00000268270679C0>
-> <click.core.Context object at 0x000002684A965F50>
File "C:\Python311\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
| -> {'host': '127.0.0.1', 'workers': 1,
'timeout': 300, 'port': 7860, 'components_path':
WindowsPath('C:/Python311/Lib/site-packa...
-> ()
File "C:\Python311\Lib\site-packages\typer\main.py", line 703, in wrapper
return callback(**use_params)
| -> {'host': '127.0.0.1', 'workers': 1, 'timeout': 300,
'port': 7860, 'components_path': WindowsPath('C:/Python311/Lib/site-packa...
-> <function run at 0x000002684ABD5300>
File "C:\Python311\Lib\site-packages\langflow\__main__.py", line 189, in run
process = run_on_windows(host, port, log_level, options, app)
| | | | | ->
<fastapi.applications.FastAPI object at 0x000002682A45CF90>
| | | | -> {'bind':
'127.0.0.1:7860', 'workers': 1, 'timeout': 300}
| | | -> 'debug'
| | -> 7860
| -> '127.0.0.1'
-> <function run_on_windows at 0x000002684ABD5440>
File "C:\Python311\Lib\site-packages\langflow\__main__.py", line 232, in
run_on_windows
run_langflow(host, port, log_level, options, app)
| | | | | ->
<fastapi.applications.FastAPI object at 0x000002682A45CF90>
| | | | -> {'bind': '127.0.0.1:7860',
'workers': 1, 'timeout': 300}
| | | -> 'debug'
| | -> 7860
| -> '127.0.0.1'
-> <function run_langflow at 0x000002684ABD5940>
File "C:\Python311\Lib\site-packages\langflow\__main__.py", line 354, in
run_langflow
uvicorn.run(
| -> <function run at 0x000002684B677420>
-> <module 'uvicorn' from
'C:\\Python311\\Lib\\site-packages\\uvicorn\\__init__.py'>
File "C:\Python311\Lib\site-packages\uvicorn\main.py", line 577, in run
server.run()
| -> <function Server.run at 0x000002684B6777E0>
-> <uvicorn.server.Server object at 0x000002684B5B7290>
File "C:\Python311\Lib\site-packages\uvicorn\server.py", line 65, in run
return asyncio.run(self.serve(sockets=sockets))
| | | | -> None
| | | -> <function Server.serve at 0x000002684B677880>
| | -> <uvicorn.server.Server object at 0x000002684B5B7290>
| -> <function _patch_asyncio.<locals>.run at
0x000002684C6E7560>
-> <module 'asyncio' from
'C:\\Python311\\Lib\\asyncio\\__init__.py'>
File "C:\Python311\Lib\asyncio\runners.py", line 190, in run
return runner.run(main)
| | -> <coroutine object Server.serve at 0x000002684AB79B70>
| -> <function Runner.run at 0x000002682949AA20>
-> <asyncio.runners.Runner object at 0x000002684AC14410>
File "C:\Python311\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
| | | -> <Task pending name='Task-1'
coro=<Server.serve() running at
C:\Python311\Lib\site-packages\uvicorn\server.py:69> wait_for=<Fu...
| | -> <function _patch_loop.<locals>.run_until_complete at
0x000002684C6E7920>
| -> <ProactorEventLoop running=True closed=False debug=False>
-> <asyncio.runners.Runner object at 0x000002684AC14410>
File "C:\Python311\Lib\asyncio\base_events.py", line 637, in
run_until_complete
self.run_forever()
| -> <function _patch_loop.<locals>.run_forever at 0x000002684C6E7880>
-> <ProactorEventLoop running=True closed=False debug=False>
File "C:\Python311\Lib\asyncio\windows_events.py", line 321, in run_forever
super().run_forever()
File "C:\Python311\Lib\asyncio\base_events.py", line 604, in run_forever
self._run_once()
| -> <function _patch_loop.<locals>._run_once at 0x000002684C6E79C0>
-> <ProactorEventLoop running=True closed=False debug=False>
File "C:\Python311\Lib\site-packages\nest_asyncio.py", line 133, in _run_once
handle._run()
| -> <function Handle._run at 0x000002682940B7E0>
-> <Handle Task.__wakeup(<Future finis...026801F806D0>>)>
File "C:\Python311\Lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
| | | | | -> <member '_args' of 'Handle'
objects>
| | | | -> <Handle Task.__wakeup(<Future
finis...026801F806D0>>)>
| | | -> <member '_callback' of 'Handle' objects>
| | -> <Handle Task.__wakeup(<Future finis...026801F806D0>>)>
| -> <member '_context' of 'Handle' objects>
-> <Handle Task.__wakeup(<Future finis...026801F806D0>>)>
File "C:\Python311\Lib\asyncio\tasks.py", line 350, in __wakeup
self.__step()
-> <Task pending name='Task-937' coro=<build_flow.<locals>._build_vertex()
running at C:\Python311\Lib\site-packages\langflow\ap...
File "C:\Python311\Lib\asyncio\tasks.py", line 267, in __step
result = coro.send(None)
| -> <method 'send' of 'coroutine' objects>
-> <coroutine object build_flow.<locals>._build_vertex at
0x0000026861F390A0>
File "C:\Python311\Lib\site-packages\langflow\api\v1\chat.py", line 219, in
_build_vertex
vertex_build_result = await graph.build_vertex(
| -> <function Graph.build_vertex at
0x000002684A39C680>
-> Graph Representation:
----------------------
Vertices (11):
ChatInput-3HVe5,
AstraVectorStoreComponent-ySu3U, ParseData-rYl...
File "C:\Python311\Lib\site-packages\langflow\graph\graph\base.py", line
1332, in build_vertex
await vertex.build(
| -> <function Vertex.build at 0x000002684A38FE20>
-> Vertex(display_name=Astra DB, id=AstraVectorStoreComponent-ySu3U,
data={'description': 'Implementation of Vector Store using ...
File "C:\Python311\Lib\site-packages\langflow\graph\vertex\base.py", line
797, in build
await step(user_id=user_id, event_manager=event_manager, **kwargs)
| | | ->
{'fallback_to_env_vars': False}
| | ->
<langflow.events.event_manager.EventManager object at 0x0000026860C44910>
| -> UUID('145754d6-7d55-480c-bac8-5a5e233c06ca')
-> <bound method Vertex._build of Vertex(display_name=Astra DB,
id=AstraVectorStoreComponent-ySu3U, data={'description': 'Implem...
File "C:\Python311\Lib\site-packages\langflow\graph\vertex\base.py", line
475, in _build
await self._build_results(
| -> <function Vertex._build_results at 0x000002684A38FA60>
-> Vertex(display_name=Astra DB, id=AstraVectorStoreComponent-ySu3U,
data={'description': 'Implementation of Vector Store using ...
> File "C:\Python311\Lib\site-packages\langflow\graph\vertex\base.py", line
694, in _build_results
result = await initialize.loading.get_instance_results(
| | -> <function get_instance_results at
0x0000026848918F40>
| -> <module
'langflow.interface.initialize.loading' from
'C:\\Python311\\Lib\\site-packages\\langflow\\interface\\initialize\\loa...
-> <module 'langflow.interface.initialize' from
'C:\\Python311\\Lib\\site-packages\\langflow\\interface\\initialize\\__init__.p
y'>
File
"C:\Python311\Lib\site-packages\langflow\interface\initialize\loading.py", line
64, in get_instance_results
return await build_component(params=custom_params,
custom_component=custom_component)
| | ->
<langflow.utils.validate.AstraVectorStoreComponent object at
0x0000026864CC6510>
| -> {'embedding':
OllamaEmbeddings(base_url='http://localhost:11434',
model='jina/jina-embeddings-v2-base-en', embed_instruction=...
-> <function build_component at 0x000002684A38DE40>
File
"C:\Python311\Lib\site-packages\langflow\interface\initialize\loading.py", line
151, in build_component
build_results, artifacts = await custom_component.build_results()
| -> <function
Component.build_results at 0x000002684A38D760>
->
<langflow.utils.validate.AstraVectorStoreComponent object at
0x0000026864CC6510>
File
"C:\Python311\Lib\site-packages\langflow\custom\custom_component\component.py",
line 617, in build_results
return await self._build_with_tracing()
| -> <function Component._build_with_tracing at
0x000002684A38D620>
-> <langflow.utils.validate.AstraVectorStoreComponent object
at 0x0000026864CC6510>
File "C:\Python311\Lib\contextlib.py", line 222, in __aexit__
await self.gen.athrow(typ, value, traceback)
| | | | | -> <traceback object at
0x00000268011631C0>
| | | | -> ValueError("Error performing search in
AstraDBVectorStore: 'content'")
| | | -> <class 'ValueError'>
| | -> <method 'athrow' of 'async_generator' objects>
| -> <async_generator object TracingService.trace_context at
0x0000026864918A40>
-> <contextlib._AsyncGeneratorContextManager object at
0x0000026801363F50>
File "C:\Python311\Lib\site-packages\langflow\services\tracing\service.py",
line 229, in trace_context
raise e
File "C:\Python311\Lib\site-packages\langflow\services\tracing\service.py",
line 226, in trace_context
yield self
-> <langflow.services.tracing.service.TracingService object at
0x0000026864D6ED50>
File
"C:\Python311\Lib\site-packages\langflow\custom\custom_component\component.py",
line 605, in _build_with_tracing
_results, _artifacts = await self._build_results()
| -> <function Component._build_results at
0x000002684A38D800>
->
<langflow.utils.validate.AstraVectorStoreComponent object at
0x0000026864CC6510>
File
"C:\Python311\Lib\site-packages\langflow\custom\custom_component\component.py",
line 640, in _build_results
result = method()
-> <bound method AstraVectorStoreComponent.search_documents of
<langflow.utils.validate.AstraVectorStoreComponent object at 0x00...
File "<string>", line 279, in search_documents
ValueError: Error performing search in AstraDBVectorStore: 'content'
╭───────────────────── Traceback (most recent call last) ─────────────────────╮
│ in search_documents:277 │
│ │
│ C:\Python311\Lib\site-packages\langchain_core\vectorstores\base.py:337 in │
│ search │
│ │
│ 334 │ │ │ │ "mmr", or "similarity_score_threshold". │
│ 335 │ │ """ │
│ 336 │ │ if search_type == "similarity": │
│ ❱ 337 │ │ │ return self.similarity_search(query, **kwargs) │
│ 338 │ │ elif search_type == "similarity_score_threshold": │
│ 339 │ │ │ docs_and_similarities = self.similarity_search_with_rele │
│ 340 │ │ │ │ query, **kwargs │
│ │
│ C:\Python311\Lib\site-packages\langchain_astradb\vectorstores.py:967 in │
│ similarity_search │
│ │
│ 964 │ │ """ │
│ 965 │ │ return [ │
│ 966 │ │ │ doc │
│ ❱ 967 │ │ │ for (doc, _, _) in self.similarity_search_with_score_id( │
│ 968 │ │ │ │ query=query, │
│ 969 │ │ │ │ k=k, │
│ 970 │ │ │ │ filter=filter, │
│ │
│ C:\Python311\Lib\site-packages\langchain_astradb\vectorstores.py:1025 in │
│ similarity_search_with_score_id │
│ │
│ 1022 │ │ │ ) │
│ 1023 │ │ │
│ 1024 │ │ embedding_vector = self._get_safe_embedding().embed_query(qu │
│ ❱ 1025 │ │ return self.similarity_search_with_score_id_by_vector( │
│ 1026 │ │ │ embedding=embedding_vector, │
│ 1027 │ │ │ k=k, │
│ 1028 │ │ │ filter=filter, │
│ │
│ C:\Python311\Lib\site-packages\langchain_astradb\vectorstores.py:1107 in │
│ similarity_search_with_score_id_by_vector │
│ │
│ 1104 │ │ │ ) │
│ 1105 │ │ │ raise ValueError(msg) │
│ 1106 │ │ sort = {"$vector": embedding} │
│ ❱ 1107 │ │ return self._similarity_search_with_score_id_by_sort( │
│ 1108 │ │ │ sort=sort, │
│ 1109 │ │ │ k=k, │
│ 1110 │ │ │ filter=filter, │
│ │
│ C:\Python311\Lib\site-packages\langchain_astradb\vectorstores.py:1129 in │
│ _similarity_search_with_score_id_by_sort │
│ │
│ 1126 │ │ │ include_similarity=True, │
│ 1127 │ │ │ sort=sort, │
│ 1128 │ │ ) │
│ ❱ 1129 │ │ return [ │
│ 1130 │ │ │ ( │
│ 1131 │ │ │ │ self.document_encoder.decode(hit), │
│ 1132 │ │ │ │ hit["$similarity"], │
│ │
│ C:\Python311\Lib\site-packages\langchain_astradb\vectorstores.py:1131 in │
│ <listcomp> │
│ │
│ 1128 │ │ ) │
│ 1129 │ │ return [ │
│ 1130 │ │ │ ( │
│ ❱ 1131 │ │ │ │ self.document_encoder.decode(hit), │
│ 1132 │ │ │ │ hit["$similarity"], │
│ 1133 │ │ │ │ hit["_id"], │
│ 1134 │ │ │ ) │
│ │
│ C:\Python311\Lib\site-packages\langchain_astradb\utils\encoders.py:144 in │
│ decode │
│ │
│ 141 │ @override │
│ 142 │ def decode(self, astra_document: dict[str, Any]) -> Document: │
│ 143 │ │ return Document( │
│ ❱ 144 │ │ │ page_content=astra_document["content"], │
│ 145 │ │ │ metadata=astra_document["metadata"], │
│ 146 │ │ ) │
│ 147 │
╰─────────────────────────────────────────────────────────────────────────────╯
KeyError: 'content'
The above exception was the direct cause of the following exception:
╭───────────────────── Traceback (most recent call last) ─────────────────────╮
│ C:\Python311\Lib\site-packages\langflow\graph\vertex\base.py:694 in │
│ _build_results │
│ │
│ 691 │ │
│ 692 │ async def _build_results(self, custom_component, custom_params, │
│ fallback_to_env_vars=False): │
│ 693 │ │ try: │
│ ❱ 694 │ │ │ result = await initialize.loading.get_instance_results( │
│ 695 │ │ │ │ custom_component=custom_component, │
│ 696 │ │ │ │ custom_params=custom_params, │
│ 697 │ │ │ │ vertex=self, │
│ │
│ C:\Python311\Lib\site-packages\langflow\interface\initialize\loading.py:64 │
│ in get_instance_results │
│ │
│ 61 │ │ if base_type == "custom_components": │
│ 62 │ │ │ return await build_custom_component(params=custom_params, │
│ custom_component=custom_component) │
│ 63 │ │ elif base_type == "component": │
│ ❱ 64 │ │ │ return await build_component(params=custom_params, │
│ custom_component=custom_component) │
│ 65 │ │ else: │
│ 66 │ │ │ raise ValueError(f"Base type {base_type} not found.") │
│ 67 │
│ │
│ C:\Python311\Lib\site-packages\langflow\interface\initialize\loading.py:151 │
│ in build_component │
│ │
│ 148 ): │
│ 149 │ # Now set the params as attributes of the custom_component │
│ 150 │ custom_component.set_attributes(params) │
│ ❱ 151 │ build_results, artifacts = await custom_component.build_results() │
│ 152 │ │
│ 153 │ return custom_component, build_results, artifacts │
│ 154 │
│ │
│ C:\Python311\Lib\site-packages\langflow\custom\custom_component\component.p │
│ y:617 in build_results │
│ │
│ 614 │ │ self, │
│ 615 │ ): │
│ 616 │ │ if self._tracing_service: │
│ ❱ 617 │ │ │ return await self._build_with_tracing() │
│ 618 │ │ return await self._build_without_tracing() │
│ 619 │ │
│ 620 │ async def _build_results(self): │
│ │
│ C:\Python311\Lib\contextlib.py:222 in __aexit__ │
│ │
│ 219 │ │ │ │ # tell if we get the same exception back │
│ 220 │ │ │ │ value = typ() │
│ 221 │ │ │ try: │
│ ❱ 222 │ │ │ │ await self.gen.athrow(typ, value, traceback) │
│ 223 │ │ │ except StopAsyncIteration as exc: │
│ 224 │ │ │ │ # Suppress StopIteration *unless* it's the same excep │
│ 225 │ │ │ │ # was passed to throw(). This prevents a StopIterati │
│ │
│ C:\Python311\Lib\site-packages\langflow\services\tracing\service.py:229 in │
│ trace_context │
│ │
│ 226 │ │ │ yield self │
│ 227 │ │ except Exception as e: │
│ 228 │ │ │ self._end_traces(trace_id, trace_name, e) │
│ ❱ 229 │ │ │ raise e │
│ 230 │ │ finally: │
│ 231 │ │ │ asyncio.create_task(await asyncio.to_thread(self._end_and │
│ trace_name, None)) │
│ 232 │
│ │
│ C:\Python311\Lib\site-packages\langflow\services\tracing\service.py:226 in │
│ trace_context │
│ │
│ 223 │ │ │ component._vertex, │
│ 224 │ │ ) │
│ 225 │ │ try: │
│ ❱ 226 │ │ │ yield self │
│ 227 │ │ except Exception as e: │
│ 228 │ │ │ self._end_traces(trace_id, trace_name, e) │
│ 229 │ │ │ raise e │
│ │
│ C:\Python311\Lib\site-packages\langflow\custom\custom_component\component.p │
│ y:605 in _build_with_tracing │
│ │
│ 602 │ │ inputs = self.get_trace_as_inputs() │
│ 603 │ │ metadata = self.get_trace_as_metadata() │
│ 604 │ │ async with self._tracing_service.trace_context(self, self.tra │
│ metadata): │
│ ❱ 605 │ │ │ _results, _artifacts = await self._build_results() │
│ 606 │ │ │ self._tracing_service.set_outputs(self.trace_name, _resul │
│ 607 │ │ │
│ 608 │ │ return _results, _artifacts │
│ │
│ C:\Python311\Lib\site-packages\langflow\custom\custom_component\component.p │
│ y:640 in _build_results │
│ │
│ 637 │ │ │ │ │ │ _results[output.name] = output.value │
│ 638 │ │ │ │ │ │ result = output.value │
│ 639 │ │ │ │ │ else: │
│ ❱ 640 │ │ │ │ │ │ result = method() │
│ 641 │ │ │ │ │ │ # If the method is asynchronous, we need to a │
│ 642 │ │ │ │ │ │ if inspect.iscoroutinefunction(method): │
│ 643 │ │ │ │ │ │ │ result = await result │
│ in search_documents:279 │
╰─────────────────────────────────────────────────────────────────────────────╯
ValueError: Error performing search in AstraDBVectorStore: 'content'
Reproduction
- Setup a database collection of type vector with Jina integration (model: jina-embeddings-v2-base-en).
- Load data into the collection.
- Create a langflow RAG AstraDB template project with OLlama Embeddings (same model as (1)).
- The AstraDB step crashes.
Expected behavior
Data should be returned from AstraDB.
Who can help?
No response
Operating System
Windows
Langflow Version
Latest
Python Version
3.11
Screenshot
No response
Flow File
No response
Hello, @sabatale! I'm here to help you with any bugs, questions, or contributions you have. Let's get to work!
To resolve the "ValueError: Error performing search in AstraDBVectorStore: 'content'" error, you need to ensure that the documents retrieved from the AstraDB collection contain the "content" field. The error suggests that the code is trying to access the "content" field in the documents, but it is not present.
Here's the relevant part of the code that processes the documents:
def _astradb_collection_to_data(self, collection):
data = []
data_dict = collection.find()
if data_dict and "data" in data_dict:
data_dict = data_dict["data"].get("documents", [])
for item in data_dict:
data.append(Data(content=item["content"]))
return data
To fix this, you should verify that the documents in your AstraDB collection have the "content" field. If they do not, you need to modify the documents to include this field or adjust the code to handle documents without the "content" field.
Here is a modified version of the _astradb_collection_to_data
method that includes a check for the "content" field:
def _astradb_collection_to_data(self, collection):
data = []
data_dict = collection.find()
if data_dict and "data" in data_dict:
data_dict = data_dict["data"].get("documents", [])
for item in data_dict:
if "content" in item:
data.append(Data(content=item["content"]))
else:
logger.warning(f"Document missing 'content' field: {item}")
return data
This modification will log a warning if a document is missing the "content" field and skip adding it to the data
list. This should prevent the "ValueError" from being raised due to missing "content" fields [1].
Hi @sabatale,
Thank you for your feedback! It would be great if you could share the Flow.json
file for debugging.
Regarding the issue, I tested with Ollama using JINA embeddings and AstraDB VectorStore, and it works as expected. Could you please provide more details about your data ingestion pipeline? Is the data ingested through Langflow?
For reference, I have attached a screenshot of the working flow.
Also, if the ingestion is not done through Langflow, please ensure that the Vector DB has the required fields. A sample data point from AstraDB ingested using Langflow will have the following structure:
Top-Level Keys:
_id
: A string representing the unique identifier.content
: A string containing textual content.$vector
: An array of numerical values.metadata
: An object containing additional information.
Keys within metadata
:
source
: A string indicating the source URL.title
: A string representing the title.language
: A string specifying the language code.
Overall JSON Structure:
{
"_id": "string",
"content": "string",
"$vector": [number, number, ...],
"metadata": {
"source": "string",
"title": "string",
"language": "string"
}
}
Please ensure your data matches this structure so that it can be processed correctly. Let me know if you have any questions or need further assistance.
Could this information about the required DataStax schema for LangFlow please be added to the documentation?
This situation can be managed if the ingest pipeline utilizes the Langflow AstraDB component. I will create a separate issue to add this to the documentation. @cystema Thank you for the feedback.
@edwinjosechittilappilly Thanks for the template! It differs from the one we get when loading data outside of Langflow, which causes the error:
"_id": ""
"type": "CompositeElement"
"text": ""
"$vector": ""