Review every-ai dependency
tomusher opened this issue · 6 comments
We're currently relying on https://github.com/tomusher/every-ai/ as a unified interface between the two AI package (wagtail-ai and wagtail-vector-index) and LLM providers.
This ensures we can use the same approach to configuring/using LLM APIs everywhere.
We should review whether it is worth continuing to develop/use every-ai or whether another library exists that would suit that same purpose (https://github.com/simonw/llm may be a good option)
Does every-ai
support streaming responses? e.g. https://llm.datasette.io/en/stable/python-api.html#streaming-responses
No, I think adding that is one of the reasons why it's not worth reinventing the wheel here - if llm
is fit for purpose I see no reason why we shouldn't just switch to it.
Having had a quick look, it seems the API surface is very similar and suitable for what we need right now.
What's missing at the moment is a way to get the token length for the selected 'chat' model (useful so we know where to split), and the size of the output vector for embedding models (something we need to know when creating indexes in some vector databases). This may be something that would be accepted in to the upstream library, or we can maintain our own mapping if not.
What's missing at the moment is a way to get the token length for the selected 'chat' model (useful so we know where to split), and the size of the output vector for embedding models (something we need to know when creating indexes in some vector databases). This may be something that would be accepted in to the upstream library, or we can maintain our own mapping if not.
@tomusher Is it the tiktoken dependency that provides us with this ability?
As mentioned in #25, we're currently hardcoding the token length because gpt-3.5-turbo
is (for the most part) the lowest common denominator for accepted token lengths.
tiktoken
tokenises a string in the same/similar way that the OpenAI APIs do. This way we can check if our string is over the model limit and then decide to split as appropriate. However, tiktoken
only works for OpenAI models.
A full resolution to this problem would be a library that:
- Gives us the token length for the configured model
- Given a string, let us know the estimated number of tokens in that string (for the configured model)
That way we can always be sure we're splitting/erroring on the appropriate token boundaries.
However, this is likely complicated - whatever library we rely on here would need to source tokenisers for every model it supports.
A simpler solution for now might to make this configurable in Wagtail AI using a WAGTAIL_AI_SPLIT_LENGTH
setting (or something...), and allow developers to set a split length that is suitable for the model they've chosen.
If we took this approach, we wouldn't need to know the token length at all (for what we're doing at the moment), so the only missing part in llm
would be the output vector size for embeddings for use in wagtail-vector-index
.
#26 now contains an idea of what it could look like.
I think we do need a wagtail-ai
specific interface given that different models want different parameters anyway and each of them has a different splitting text and max token length requirements too, and may want to use different parameters for different models and maybe even for different prompts...
It very much feels like we need a wagtail-ai specific interface for this and it would be impossible to device a common interface for both wagtail-ai and wagtail-vector-index given specific use cases. It just wouldn't feel right.