Josephrp/DataTonic

MongoDB for Vector Database / Database provider choice

Closed this issue · 8 comments

I would go for MongoDB

  • Used in AgentCloud (tested )
  • I'm familiar with it
  • Handle also vector through atlas vector search , memory,
  • is compatible or at least complementary with our stack
  • Cost effective and scalable with vCore and got some credits
  • Easy to use
  • connectivity with retool
  • Is compatible "natively" with Azure solutions
    image

Postgres with pgvector extension
Free, open source
I have used it to store LLM embeddings and build cosine similarity search

Me too through Supabase, in the meantime, pg_vector as cool it can be seems less relevant in this case
MongoDB Atlas has a better synergy with autogen or semantic kernel (or any related AI stuff from Azure/microsoft
Also Atlas Vector Search (beside its core feature) has litterally a whole integration for Mongo for Semantic Kernel as Memory
image
The microsoft documentation mention it, and it is way more cost efficient with vCore
Also have said it but got in addition mongodb credit offered by microsoft themselve. you might not seen it but pg_vector still needs to be hosted, even if it's open-source !

just an example :
image
(yes on 1M vector) but since we are talking probably huge ingestion, etc... can see the benefit of using pg_vector especially with the different benchmark which shows that is not the most cost efficient even if it's open source

whereas, i was able to simply through agentcloud utilize mongodb with ease

and also it's litterally way more cheaper and efficient, and faster

allmost forgot but if i'm not wrong pg_vector cannot ingest multi modal format data (img, etc..) where atlas can

  • mongodb can act as a safety net, if we don't end up with autogen taskweaver, semantic kernel all using gemini and truegens, it will serve as well as memory (best case scenario with semantic) or at worst as our database and vector db

Postgres with pgvector extension Free, open source I have used it to store LLM embeddings and build cosine similarity search

the stable Postgres connector in C# for semantic-kernel was published yesterday and it's pending review for the next release, so we can use that, but i dont really know how :-)

@jsaluja

as you wish guys, still interested on why this decision especially, i might have a blind spot here, and i'm 100% to change my mind but can't see it honestly
https://learn.microsoft.com/en-us/azure/cosmos-db/introduction

My experience with pgvector is based on AWS Postgres.
Good to learn about the mongodb support on Azure.
Thank you for sharing the perf benchmarks.

@Zochory
Question - are you referring to the mongodb credits from microsoft startup hub ?

Please reopen this issue with a corresponding pull request if you refactor the code accordingly. Currently we're using SQLite in Taskweaver.