BUG: sqlalchemy conflict between airflow and llama-index
Closed this issue · 1 comments
As we're updating the llama-index library version to use their newest features (pipelines, docstore, etc), we're hitting an error that is
Error!!!: Too old airflow version.
This error is being raised because docker cannot run the gosu
command to get the airflow version therefore it is raising error. Looking at the logs it seems it is raising because of the sqlalchemy version of apache airflow should be <=1.4.49
and the version for us to use the newest llama-index is greater than 2.0
. In this case airflow service cannot come up and is raising this error.
To resolve this error we need to migrate to another vector database that is not very dependent on sqlalchemy version.
Researching about it, we found out that our best alternative is Qdrant database which supports async + metadata filtering (ref: QDrant features
To update our systems to use the Qdrant database we have the following tasks
- Update
docker-compose.yaml
anddocker-compose.test.yaml
to use a stable version of qdrant database instead of pgvector - Update CustomIngestionPipeline to use qdrant database for vector-stores and docstore
- Update discord-etl to assign message id to each llama-index Document
- Update discord-etl to use the CustomIngestionPipeline
- Update discord-summary-etl to assign a unique value to each summary document *
- Update discord-summary etl to use the CustomIngestionPipeline
- Update
discourse_vector_store
ETL to assign a unique value to each document * - Update
discourse_summary_vector_store
ETL to assign a unique value to each document * - Update
discourse_vector_store
ETL to use the CustomIngestionPipeline - Update
discourse_summary_vector_store
ETL to use the CustomIngestionPipeline - Update
github_vector_store
ETL to assign a unique id to each document * - Update
github_vector_store
ETL to use the CustomIngestionPipeline - Update GDrive ETL to use the CustomIngestionPipeline
Note *: IDs should be the same across multiple runs. This is because the docstore could check for duplicated or updated nodes.
For now, we'll be keeping the old codes to use the pgvector and slowly we'll migrate from pgvector to qdrant.