[ENHANCEMENT] Add methods for `BulkCreateOperation`/`BulkUpdateOperation` without `_id`
dodomorandi opened this issue · 3 comments
Is your feature request related to a problem? Please describe.
As already mentioned in #154, the fact that _id
is a mandatory field for the API
, but since Elasticsearch 7.6 these parameters are not required anymore for create
and update
bulk operations. If someone wants to create a new document with a generated _id
, the API cannot be used.
Describe the solution you'd like
There are some alternatives available, I thing that all of them have pros and cons.
- Add methods to create
BulkCreateOperation
andBulkUpdateOperation
withoutid
. Something like:This can be seen as _strange, because you can find builder methods prefixed withimpl<B> BulkCreateOperation<B> { pub fn without_id<S>(source: S) -> Self { /* ... */} }
with_
, but not withwithout_
. - The same as before, but the methods are only available if a feature is available. This avoids the usage of the API with old versions of Elasticsearch server without noticing. On the other hand, this makes the code a bit harder to maintain.
- Use a feature to change the signature of
new
function forBulkCreateOperation
andBulkUpdateOperation
(and related fns onBulkOperation
). This is probably the most controversial solution, because, even if this cannot be considered a breaking change (a user must add a feature to break API compatibility), it is obviously not the best idea in term of semantic versioning. Moreover, this could be even more tricky than point 2 to maintain.
As you can see, I did not express a preference for a solution: I understand that the versioning policy in this case makes things difficult, and my desire is just a way to use the API. AFAIK, the only way in which I can rely on id
auto-generation is manually creating the body for create
bulk operations, making the current API partially unusable.
If you think my criticisms are valid, and we are able to choose a solution for a possible implementation (maybe even something different from my proposal), I will be happy to help with a PR.
Hi @dodomorandi,
Firstly, apologies that it's taken some time to respond to you.
Updating the BulkCreateOperation
struct to not require an id, and to make it an optional value that can be set via an associated function, similar to BulkIndexOperation
makes sense, especially now that data stream usage is popular. So
elasticsearch-rs/elasticsearch/src/root/bulk.rs
Lines 164 to 170 in d52932e
would end up as
/// Creates a new instance of a [bulk index operation](BulkIndexOperation)
pub fn create(source: B) -> BulkCreateOperation<B> {
BulkCreateOperation::new(source)
}
impl<B> BulkCreateOperation<B> {
/// Creates a new instance of [BulkCreateOperation]
pub fn new(source: B) -> Self {
Self {
operation: BulkOperation {
header: BulkHeader {
action: BulkAction::Create,
metadata: BulkMetadata::default(),
},
source: Some(source),
},
}
}
/// Specify the id for the document
///
/// If an id is not specified, Elasticsearch will generate an id for the document
/// which will be returned in the response.
pub fn id<S>(mut self, id: S) -> Self
where
S: Into<String>,
{
self.operation.header.metadata._id = Some(id.into());
self
}
// rest of code
}
Bulk update operations always need an id; It was an error in documentation that specified it as optional, which has now been fixed.
I'm not sure how much time I have in the immediate future to address this- is it something that you would be interested in submitting a PR for? The tests in bulk.rs would require an update too, but it's a small change.
Thank you @russcam for your reply!
I would be glad to submit a PR, you already proposed a nice implementation.
Just to be sure: is it ok to make a breaking change with the signature of create
? I know that the project is using the -alpha
suffix, but I am asking anyway before submitting the PR.
I think a breaking change in this instance is ok; the API for bulk create will not need an id going forward 👍