qdrant/qdrant-spark

[DOC] pyspark additional examples

gulldan opened this issue · 4 comments

Hello, thanks for interesting project.

Could you please provide in the documentation how to make a similar example for pyspark?
I don’t understand how to make an equivalent

qdrant_client.upsert(
    collection_name=collection_name,
    points=[
        PointStruct(
            id=str(uuid.uuid4()),
            vector=[float(x) for x in emb],
            payload={"pubid": pubid}
        )
        for pubid, emb in pubs
        if len(emb) == embedding_sz
    ]
)

upd.
will be a way in the future to create collection via connector, not precreated collections?

Interested in this too

also how to do search query

Hai. The connector is intended for loading distributed data, so there are no plans to make it support creating collections or supporting searches.
As it is more convenient to do it separately with the necessary configs.

For upserting data, you'll have to create dataframe with your payload and embeddings to add to Qdrant. Here's a reference.
Your payload dataframe columns can be of any type and the datatypes for the upload options are mentioned here.

FYI guys, we have Pytest tests now, which can also be used as examples.
https://github.com/qdrant/qdrant-spark/blob/master/src/test/python/test_qdrant_ingest.py