elastic/elasticsearch-rs

[ENHANCEMENT] Add methods for `BulkCreateOperation`/`BulkUpdateOperation` without `_id`

dodomorandi opened this issue · 3 comments

Is your feature request related to a problem? Please describe.
As already mentioned in #154, the fact that _id is a mandatory field for the API, but since Elasticsearch 7.6 these parameters are not required anymore for create and update bulk operations. If someone wants to create a new document with a generated _id, the API cannot be used.

Describe the solution you'd like
There are some alternatives available, I thing that all of them have pros and cons.

  1. Add methods to create BulkCreateOperation and BulkUpdateOperation without id. Something like:
    impl<B> BulkCreateOperation<B> {
        pub fn without_id<S>(source: S) -> Self { /* ... */}
    }
    This can be seen as _strange, because you can find builder methods prefixed with with_, but not with without_.
  2. The same as before, but the methods are only available if a feature is available. This avoids the usage of the API with old versions of Elasticsearch server without noticing. On the other hand, this makes the code a bit harder to maintain.
  3. Use a feature to change the signature of new function for BulkCreateOperation and BulkUpdateOperation (and related fns on BulkOperation). This is probably the most controversial solution, because, even if this cannot be considered a breaking change (a user must add a feature to break API compatibility), it is obviously not the best idea in term of semantic versioning. Moreover, this could be even more tricky than point 2 to maintain.

As you can see, I did not express a preference for a solution: I understand that the versioning policy in this case makes things difficult, and my desire is just a way to use the API. AFAIK, the only way in which I can rely on id auto-generation is manually creating the body for create bulk operations, making the current API partially unusable.

If you think my criticisms are valid, and we are able to choose a solution for a possible implementation (maybe even something different from my proposal), I will be happy to help with a PR.

Hi @dodomorandi,

Firstly, apologies that it's taken some time to respond to you.

Updating the BulkCreateOperation struct to not require an id, and to make it an optional value that can be set via an associated function, similar to BulkIndexOperation makes sense, especially now that data stream usage is popular. So

/// Creates a new instance of a [bulk create operation](BulkCreateOperation)
pub fn create<S>(id: S, source: B) -> BulkCreateOperation<B>
where
S: Into<String>,
{
BulkCreateOperation::new(id, source)
}

would end up as

/// Creates a new instance of a [bulk index operation](BulkIndexOperation)
pub fn create(source: B) -> BulkCreateOperation<B> {
    BulkCreateOperation::new(source)
}

impl<B> BulkCreateOperation<B> {
    /// Creates a new instance of [BulkCreateOperation]
    pub fn new(source: B) -> Self {
        Self {
            operation: BulkOperation {
                header: BulkHeader {
                    action: BulkAction::Create,
                    metadata: BulkMetadata::default(),
                },
                source: Some(source),
            },
        }
    }

    /// Specify the id for the document
    ///
    /// If an id is not specified, Elasticsearch will generate an id for the document
    /// which will be returned in the response.
    pub fn id<S>(mut self, id: S) -> Self
    where
        S: Into<String>,
    {
        self.operation.header.metadata._id = Some(id.into());
        self
    }

    // rest of code
}

Bulk update operations always need an id; It was an error in documentation that specified it as optional, which has now been fixed.

I'm not sure how much time I have in the immediate future to address this- is it something that you would be interested in submitting a PR for? The tests in bulk.rs would require an update too, but it's a small change.

Thank you @russcam for your reply!
I would be glad to submit a PR, you already proposed a nice implementation.
Just to be sure: is it ok to make a breaking change with the signature of create? I know that the project is using the -alpha suffix, but I am asking anyway before submitting the PR. ☺️

I think a breaking change in this instance is ok; the API for bulk create will not need an id going forward 👍