danielberkompas/elasticsearch-elixir

Elasticsearch.StreamingStore behaviour or something alike

imranismail opened this issue · 4 comments

Reason being we could utilize Repo.stream with Repo.transaction with timeout of infinity.

LIMIT + OFFSET is linear when getting the last 100 in a 1 million row table. I'll have to go through the first 99900. Using a cursor or a stream with a timeout of infinity can help in this case.

Right now I avoid having long queries (waiting for the offset to reach 99900) by doing something like this:

        User
        |> select([:name, :email, :phone, :id])
        |> Repo.stream()
        |> Stream.drop(offset)
        |> Enum.take(limit)

But streaming to the end in one shot would be much much preferred.

This can also play well when data is ingested from a GenStage producer.

@imranismail I have a PR open to do this: #36. Do you have any feedback?

cdunn commented

@danielberkompas @imranismail The switch to streams impacts the ability to preload relationships
warning: passing a query with preloads to Repo.stream/2 leads to erratic behaviour and will raise in future Ecto versions
elixir-ecto/ecto#2424
...i'm still looking at how to handle appropriately but just thought I'd mention

I think the solution might be to use a database cursor instead of Repo.stream. I did this in Cloak and it seems to work well.