luckyframework/avram

Add upsert ability to SaveOperations

Closed this issue · 10 comments

Sometimes you want to save a record, sometimes you want to create a record. But there's also some cases when you're not sure which one you need, so an upsert.

SaveUser.upsert do |op, user|
  #...
end

The SQL would end up looking something like this using ON CONFLICT

INSERT INTO users (id, name) VALUES (1, 'Billy') ON CONFLICT DO UPDATE SET name='Billy';

I had an idea to get this 99% of the way there and maintain type safety! Here's the rough sketch for how I think it would work:

Add conflict_keys macro to SaveOperation

The conflict_keys macro would generate a upsert/upsert! and find_or_create/find_or_create!` methods.

Here is roughly how you'd use it:

class SaveUser < User::SaveOperation
  conflict_keys email
end

# If user is found with matching email then update, otherwise create
SaveUser.upsert!(params)
SaveUser.upsert(params) do |operation, user|
  # same thing but yields like the create/update does
end 

SaveUser.find_or_create!(params) # Finds (but does not update) or creates
SaveUser.find_or_create(params) do |operation, user|
  # 'user' will be nil if not found *or* if the params are invalid and therefore could not create a new one
end

Here's roughly how it would be implemented

# Not sure about naming. This is just a rough idea
module Avram::ConflictKeys
  macro conflict_keys(*attribute_names)
    def self.upsert!(*args, **named_args)
       operation = new(*args, **named_args)
       existing_record = BaseQuery.new(
       {% for attribute in attribute_names %}
         {{ attribute.id }}: operation.{{ attribute.id }}.value,
       {% end %}
       ).first?

       if existing_record
         operation.update!
       else
         operation.create!
       end
    end
      # And then generate a similar method for `find_or_create`
  end

  # Add a method for documentation and for a nice compile time error if used before calling `conflic_keys`
  def self.upsert!(*args, **named_args)
    {% raise "Please call 'conflict_keys' on your operation before using upsert. <link to docs> %}
  end
end

Some gotchas

This does have some race conditions :( The upside is that we can add this to Avram pretty easily without implementing ON CONFLICT quite yet. I think the race condition is extremely rare in most cases, and if you have a unique index on these your data won't ever get into a bad state anyway. The most you'll get is an error saying you had a non-unique column. So I think this is a great way to get started

Later on we could change the generated method to use ON CONFLICT but that can come at a later point

Also worth noting that this does lock you in to one set of conflict keys per operation. After much research and thought I think that's ok. Typically you only use upsert/find_or_create in a few spots and the keys are typically the same. Secondly you could/should create a new operation if you want something with different conflict keys. Something like ImportUserFromCsv would be a great candidate for using conflict_keys

I love this and would simplify some stuff I am working on. I like the interface and how it mimics other interfaces used.

Super small think is conflict_keys is a domain specific word there might be something more clear. I like the use of needs in other parts of the API.

@wontruefree Glad to hear that this would be helpful!

I am definitely not sold on conflict_keys but not sure what else... Here's some ideas, but I'm open to more!

  • unique_keys
  • target_keys

Honestly that's it. I'm having trouble thinking of anything else that better describes what this does

After reading a little it seems like there is more then a 'uniqueness' constraint you can put on a key. Which is why I am guessing why postrgres used the term conflict.

I like the idea of using conflicting but that seems pedantic. Also I think the the other APIs in SaveOperation use the terms columns and keys

Yeah good point on columns. We tend to use that more so conflict_columns sounds a bit clearer

I think conflicting could be strange if you have multiple columns: conflicting_columns organization_id, email. It almost looks as if you're saying those two columns conflict with each other. Thoughts?

I think the interface has the potential to be confusing if you leave the word conflict in it.
conflicting_columns organization_id, email
conflicting_keys organization_id, email

Also I like columns better now that I see them.

Hmm maybe upsert_keys? Technically it'd generate the find_or_create as well, but I don't think that is a big deal. Thoughts?

The more I think about it and the reasons behind on conflict I feel like the term conflict makes sense.

But for the base case the one that people will most likely use is uniqueness. So having unique_columns makes sense.

Probably create_or_find instead of find_or_create like in Rails 6 would be better?