jina-ai/vectordb

Missing `__validators__` property with `pydantic>=2`

gabe-l-hart opened this issue · 6 comments

Description

When instantiating an InMemoryExactNNVectorDB instance using pydantic>=2, an exception is raised about the __validators__ attribute being missing. This does not happen with pydantic<2.

Environment

  • M1 MacBookPro (M1 Max arm chip)
  • Python 3.11.5 (conda)
  • Packages:
    • vectordb==0.0.20
    • docarray==0.40.0
    • pydantic==2.3.0
    • pydantic-extra-types==2.3.0
    • pydantic-settings==2.1.0
    • pydantic_core==2.6.3

Workaround

When I add the __validators__ property (even with a value of None), the bug is avoided and the database works as expected. I'm not clear if it's functioning with less validation than desired since this line will result in __validators__=None, but the DB is able to store and retrieve documents just fine.

Repro

from docarray import BaseDoc
from vectordb import InMemoryExactNNVectorDB

class BrokenDoc(BaseDoc):
    text: str
    # Uncomment this to work around the bug!
    # __validators__ = None

db = InMemoryExactNNVectorDB[BrokenDoc]()

Error

Traceback (most recent call last):
  File "vdb_repro.py", line 8, in <module>
    db = InMemoryExactNNVectorDB[BrokenDoc]()
         ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/opt/miniconda3/envs/env/lib/python3.11/site-packages/vectordb/db/base.py", line 40, in __class_getitem__
    class VectorDBTyped(cls):  # type: ignore
  File "/opt/miniconda3/envs/env/lib/python3.11/site-packages/vectordb/db/base.py", line 42, in VectorDBTyped
    _executor_cls: Type[TypedExecutor] = cls._executor_type[item]
                                         ~~~~~~~~~~~~~~~~~~^^^^^^
  File "/opt/miniconda3/envs/env/lib/python3.11/site-packages/vectordb/db/executors/typed_executor.py", line 71, in __class_getitem__
    output_schema = create_output_doc_type(input_schema)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/env/lib/python3.11/site-packages/vectordb/utils/create_doc_type.py", line 18, in create_output_doc_type
    __validators__=input_doc_type.__validators__,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/env/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 210, in __getattr__
    raise AttributeError(item)
AttributeError: __validators__

Hey @gabe-l-hart ,

Does it also happen if you use a Document with an embedding field?

I see now.

Compatibility with pydantic>v2 is not guaranteed because Jina dependency does not gurantee it yet

Hi @JoanFM! Totally makes sense that pydantic>=2 is not supported. The dependency is coming in as a transitive dep from docarray (here) with no upper bound, but I see it also comes in from jina here. I'm not clear how difficult it would be to fix the jina requirement pinning, but in vectordb, the fix may be as simple as changing this line to __validators__=getattr(input_doc_type, "__validators__", None) (default to None if the __validators__ attr isn't present). It's a pretty simple solution, but I haven't plumbed the depths of vectordb, docarray, or pydantic to understand what (if any) unintended consequences it would have.

For context on why I submitted the issue, I've really enjoyed vectordb in a sample project that uses caikit (one of my primary projects), but we require pydantic>=2 for its Annotation Validators feature. I'd love it if the two projects could play nicely together! Currently, I'm forced to do a two-step install to get all of the caikit requirements, then install vectordb to downgrade pydantic which isn't ideal.

Hi @JoanFM! Totally makes sense that pydantic>=2 is not supported. The dependency is coming in as a transitive dep from docarray (here) with no upper bound, but I see it also comes in from jina here. I'm not clear how difficult it would be to fix the jina requirement pinning, but in vectordb, the fix may be as simple as changing this line to __validators__=getattr(input_doc_type, "__validators__", None) (default to None if the __validators__ attr isn't present). It's a pretty simple solution, but I haven't plumbed the depths of vectordb, docarray, or pydantic to understand what (if any) unintended consequences it would have.

You can try to do this change and create a PR to the repo.

As per Jina it is a little more complex

Here you go! #68