marqo-ai/marqo

Aggregations [ENHANCEMENT]

pandu-k opened this issue · 3 comments

Is your feature request related to a problem? Please describe.
There are limited aggregation options in Marqo.

Describe the solution you'd like
Min, max, sum, mean of a field. Count of unique values taken on by a field.

For example: the sum of a field across all docs in the index (perhaps with filtering).

Describe alternatives you've considered
Doing the analysis in a different database. The downside is that this increase application complexity

I think a unique set of values from a field would be useful too. For example:
doc1 tags: [red, blue]
doc2 tags: [blue]
doc3 tags: [yellow, blue]

mq.index.docs.tags().unique() -> [red, yellow, blue]

I am bumping into this requirement again and think I am going to have to start putting a special metadata/aggregation record into each of the marqo indexes as a workaround. Probably going to need to instroduce another persistance layer altogether now that I think about it.

It's a little more complex than the above example because I need to do a groupby group, e.g.
source_pdf1 -> docs -> tags: [red, blue]
source_pdf2 -> docs -> tags: [blue]
source_pdf3 -> docs -> [yellow, blue]

The goal is to count the number of pdfs that have docs with certain tags. Pdfs don't exist anymore, they are just another piece of metadata on the docs, but I hope the use case is clear.

@pandu-k Can't we integrate a separate package, for example - pandas or polars(for larger data) which could handle these aggregation calls. These tools are specifically designed for that, so we can maybe send/stream the data from marqo to these tools and perform the aggregations. Is this feasible?