NDJSON/CSV methods to add and update documents
curquiza opened this issue · 6 comments
add_documents_json
instead of addDocumentsJson
). Keep the already existing way of naming in this package to stay idiomatic with the language and this repository.
📣 We strongly recommend doing multiple PRs to solve all the points of this issue
MeiliSearch v0.23.0 introduces two changes:
- new valid formats to push data files, additionally to the JSON format: CSV and NDJSON formats.
- it enforces the
Content-type
header for every route requiring a payload (POST
andPUT
routes)
Here are the expected changes to completely close the issue:
-
Currently, the SDKs always send
Content-Type: application/json
to every request. Only thePOST
andPUT
requests should send theContent-Type: application/json
and not theDELETE
andGET
ones. -
Add the following methods and 🔥 the associated tests 🔥 to ADD the documents. Depending on the format type (
csv
orndjson
) the SDK should sendContent-Type: application/x-dnjson
orContent-Type: text/csv
)-
addDocumentsJson(string docs, string primaryKey)
-
addDocumentsCsv(string docs, string primaryKey)
-
addDocumentsCsvInBatches(string docs, int batchSize, string primaryKey)
-
addDocumentsNdjson(string docs, string primaryKey)
-
addDocumentsNdjsonInBatches(string docs, int batchSize, string primaryKey)
-
-
Add the following methods and 🔥 the associated tests 🔥 to UPDATE the documents. Depending on the format type (
csv
orndjson
) the SDK should sendContent-Type: application/x-dnjson
orContent-Type: text/csv
)-
updateDocumentsJson(string docs, string primaryKey)
-
updateDocumentsCsv(string docs, string primaryKey)
-
updateDocumentsCsvInBatches(string docs, int batchSize, string primaryKey)
-
updateDocumentsNdjson(string docs, string primaryKey)
-
updateDocumentsNdjsonInBatches(string docs, int batchSize, string primaryKey)
-
docs
are the documents sent as String
primaryKey
is the primary key of the index
batchSize
is the size of the batch. Example: you can send 2000 documents in raw String in docs
and ask for a batchSize
of 1000, so your documents will be sent to MeiliSearch in two batches.
Example of PRs:
- in PHP SDK: meilisearch/meilisearch-php#235
- in Python SDK: meilisearch/meilisearch-python#329
Related to: meilisearch/integration-guides#146
If this issue is partially/completely implemented, feel free to let us know.
The idea is the encapsulation of the functions for ndjson / json / csv? It seems add_or_replace_unchecked_payload
permits these formats
/// let task = movie_index.add_or_replace_unchecked_payload(
/// r#"{ "id": 1, "body": "doggo" }
/// { "id": 2, "body": "catto" }"#.as_bytes(),
/// "application/x-ndjson",
/// Some("id"),
/// ).await.unwrap();
Also, about the naming, shouldn't it be snakecase , not camelcase?
Hey @carlosb1
Indeed add_or_replace_unchecked_payload
does the trick. The PR is kept open in case someone want to specifically implement these functions.
@curquiza issue is an issue that was created in our different SDK's, this is why the API design might not exactly follow rust conventions. We expect the contributor to adapt this design to be more in line with rust :)
Well. I did a first PR with the first functions... I think it can be a good kickoff.... Furthermore, I think there are an issue with the tests, it saw some random error.
Not all features are done so I re open
Checking the code for the implementation of the batches functions: addDocumentsNdjsonInBatches
, addDocumentsCsvInBatches
, etc. I am not sure it makes sense... How should it split in batches?.... the input param for each function is a string.. you can not decide how to split these strings... In the current implementation works because it uses &[T]
where T is an independent object serialized... it can send as each request in parallel
Indeed, you are right. If it makes no sense for the community, let's close it. This package is for the community, so let's not add useless maintenance 😄