[DOC] pyspark additional examples

Question

[DOC] pyspark additional examples

gulldan opened this issue 10 months ago · 4 comments

Hello, thanks for interesting project.

Could you please provide in the documentation how to make a similar example for pyspark?
I don’t understand how to make an equivalent

qdrant_client.upsert(
    collection_name=collection_name,
    points=[
        PointStruct(
            id=str(uuid.uuid4()),
            vector=[float(x) for x in emb],
            payload={"pubid": pubid}
        )
        for pubid, emb in pubs
        if len(emb) == embedding_sz
    ]
)

upd.
will be a way in the future to create collection via connector, not precreated collections?

Answer 1 · 2023-12-13T12:40:11.000Z

Interested in this too

Answer 2 · 2023-12-13T12:48:58.000Z

also how to do search query

Answer 3 · 2023-12-15T09:04:13.000Z

Hai. The connector is intended for loading distributed data, so there are no plans to make it support creating collections or supporting searches.
As it is more convenient to do it separately with the necessary configs.

For upserting data, you'll have to create dataframe with your payload and embeddings to add to Qdrant. Here's a reference.
Your payload dataframe columns can be of any type and the datatypes for the upload options are mentioned here.

Answer 4 · 2024-04-08T16:39:58.000Z

FYI guys, we have Pytest tests now, which can also be used as examples.
https://github.com/qdrant/qdrant-spark/blob/master/src/test/python/test_qdrant_ingest.py