neuml/txtai

Fix resource issues with embeddings indexing components backed by databases

Closed this issue · 0 comments

Currently, there are scenarios where embeddings index components backed by a database (i.e. pgvector) have issues with upserts that delete all existing data.

The following issues have been identified.

  • Passing the SQLAlchemy engine to table DDL statements. This wraps the operation with another layered transaction.
  • Passing the SQLAlchemy engine to the database session. This is causing locking behavior within the same database component.
  • For ANNs backed by databases, the close method must be run before recreating a new ANN. Logic should be added to ensure this.

This work will address these issues and ensure that database-connected indexing components have all their actions run through a single transaction until a save is called. This ensures consistency with file-based components.