ent/ent

Feature Request: Upsert

nscimerical opened this issue ยท 20 comments

Is it possible to have atomic upsert functionality similar to postresql's INSERT ON CONFLICT or mysql's INSERT ON DUPLICATE KEY UPDATE statement? Use case is to save roundtrips on remote databases and to minimize lines of code.

a8m commented

This issue somehow depends on #127. Although, it's possible to define the conflict on any constraint.

I don't see any problem with adding this after #127 will be addressed.

@a8m do you have already a syntax for upsert in mind? Since upsert is different in Postgres/Sqlite and Mysql, Ent should probably abstract that common case.

Just a draft of my current thinking:

pedro, err := client.Pet.
    UpsertOneID(id).
    Create(
	pet.Create().SetName("pedro")
    ).
    Update(
        pet.Update().SetOwnerID(owner)
    ).
    Save(ctx)

This would allow us to reuse the Settters for Create and Update and ensure that Create and Update can include different properties.

This looks rather promising.

Any ETA on this? :)

I want one.

I need it.

tyuv commented

yes please

Sorry to bug but is this being worked on? ๐Ÿคฆ๐Ÿพโ€โ™‚๏ธ

Has anyone made a start on this? I'd like to contribute!

A few thoughts on this โ€” at least from a Postgres standpoint โ€” using ON CONFLICT requires some sort of annotation to indicate that a field is unique (already supported in the schema) or supply it with a named constraint (also generated by using index.Field(...).Unique(). Beyond that we need to update the graph inserter with the necessary pieces to write the conflict statement. Assuming something like @chris-rock suggested above, I've started to put this together:

// Pet holds the schema definition for the Pet entity.
type Pet struct {
	ent.Schema
}

// Fields of the Pet.
func (Pet) Fields() []ent.Field {
	return []ent.Field{
		field.String("name"),
	}
}

func (Pet) Indexes() []ent.Index {
	return []ent.Index{
		index.Fields("name").Unique(),
	}
}

// Usage:
pedro, err := client.Pet.
    Upsert().
    Create().SetName("pedro").
    Update().SetName("pedro-2").
    Save(ctx)

WIP #1121

a8m commented

Has anyone made a start on this? I'd like to contribute!

I started an experiment for handing this, the only open question about it is whether we want to add a new hook/policy type for UpsertOne and Upsert (bulk).

Regarding the API, I used something as follows:

client.Pet.
    Create().
    // Setters...
    OnConflict(...).
    Update()
    // Setters...
    Save(ctx)

The explicit OnConflict API is nice โ€” would it take a list of columns or options of some sort?

IIRC Postgres requires the conflict columns to be set to columns with a unique constraint and throws an error otherwise, would this be handled by Indexes/Field annotations?

Also interested in bulk upsert. By now inserting or updating multiple objects can be done only one by one and that's very slow on large sets.

@DanielTitkov I've been slowly chipping away at this. It's quite complex when introducing Bulk changes, but I'm pretty close to having a PoC with Postgres to shake out how the API feels.

a8m commented

Hey all, and thanks @ivanvanderbyl for working on #1325.

We really want to push this feature forward, but I have some concerns/questions regarding the API (as mentioned in our Slack channel). I'll appreciate getting feedback/help/suggestions. Thanks ๐Ÿ™

  • We prefer to start with UpsertOne and then add support for BulkUpsert (when we feel comfortable with the new API).
  • What about Hooks and Privacy, do we need to provide a new operation for it? If so, do we need to provide an API that indicates if an entity was created or updated (post hook/privacy-rule).
  • MySQL doesn't support the RETURNING clause, and SQLite added support for it 2-3 months ago. What should be the return value of the Upsert function? Do we need to provide a few alternatives like Exec/Save and fail on runtime if the driver doesn't support getting the rows back? Or, maybe we can handle this using SELECT FOR UPDATE?
  • @sneakywombat proposed a new API for it here - #1325 (comment). I actually like it, but using Where(...) is not possible with @ivanvanderbyl's implementation (right?). What about supporting both APIs?
    // #1
    client.User.Create().
        SetEmail("boring@entgo.io").
        OnConflict(user.FieldEmail).
        Update().
        SetEmail("boring2@entgo.io").
        Save(ctx)
    
    // #2 (Will execute 2 statements to the databases. 1 for query records, and another one for creating or updating records). 
    client.User.Upsert().
        Where(user.Email("boring@entgo.io")).
        Create(... /* ent.UserMutation */)
        Update(... /* ent.UserMutation */)
        Save(ctx).

cc @rotemtam, @alexsn

Hey @a8m ,
My few 0.02$ on the issue:

We prefer to start with UpsertOne

I agree, start with UpsertOne. From my (limited) perspective, this is the common use case.

What about Hooks and Privacy, do we need to provide a new operation for it?

In general, as discussed, there are two approaches to upsert like behavior:

  • INSERT .. ON DUPLICATE KEY UPDATE
  • SELECT .. FOR UPDATE (SfU henceforth)
    I think it will be much simpler to support upserts in ent using SfU since it fits very nicely with the current hooks/privacy definitions. In order for a mutation to succeed, it must pass both privacy checks instead of adding a new, somewhat weird "upsert" notion.
    It has the additional advantage that, as you mentioned,

MySQL doesn't support the RETURNING clause

Using SfU simplifies this because we can then return the final value for any dialect.

The downside of course is SfU requires 2 round-trips to the database and holding a lock on a resource which in some use cases my cause substantial performance issues. I think those can be solved using something which was discussed in another issue (can't find it now), which would provide a generic API for modifying SQL queries in place in those edge cases were specific tweaking is required.

Summarizing a discussion with @a8m, @ivanvanderbyl, and @arielitovsky from June 22nd

  • INSERT .. ON DUPLICATE do not map nicely to ent's current hooks/privacy infrastructure. to keep those in place, upsert APIs will be available out of the box for working with a single record and be implemented with SELECT .. FOR UPDATE.

  • INSERT .. ON DUPLICATE are useful for bulk upserts, mostly in data pipeline job style setting to improve performance by reducing the number of calls to the database.

    These will either be implemented by adding an extension/feature flag to generate bulk upsert methods or by adding a "query mutation" modifier to make low-level modifications to queries before they are executed. Either way, the API and documentation should make it clear to the user that this path bypasses hooks and privacy checks and should be used with caution.

a8m commented

Hey, the upsert and upsert-bulk APIs were added on #1793. Please, feel free to share your thoughts on the PR.

a8m commented

Closing, as it's been used in production for more than 3 months and work without any issues. Thanks all for the feedback โค๏ธ

Can we unpin this issue? :D