asg017/sqlite-rembed

Batch support

asg017 opened this issue · 2 comments

Currently the rembed() function makes a new HTTP request for every item. For example, for this query:

select rembed('myModel', field)
from my_table;

If my_table has 100,000 rows, then 100,000 sequential HTTP requests would be sent.

This isn't ideal, most of these providers support multiple inputs in a single request, which should help with rate limits and speed. But finding a good SQL API that works with SQLite can be tricky.

A few different options:

Option 1: Table function with JSON array input

with subset as (
  select json_group_array(
    'id', rowid,
    'contents', my_table.field
  ) as value
  from my_table
)
select 
  rowid, 
  embedding
from subset
join rembed_each('myModel', json(subset.value))

Option 2: input in (...) with serialized rembed_item()

select 
  rowid,
  embedding
from rembed('myModel')
where inputs in (select rembed_item(id, field) from my_table);

I want to see batch fixed as I'm trying in-memory sqlite search

re: my bad I'm using ts rn

+n to this please!