zdennis/activerecord-import

Importing an array of hashes ignores columns not in the first hash

ahmad-sanad opened this issue · 7 comments

Title says it all. Say we have table Foo with columns id, bar, baz, and want to import the following:

arr = [
  { bar: 'abc' },
  { baz: 'xyz' },
  { bar: '123', baz: '456' }
]

Foo.import arr

This creates the rows:

+----+------+------+
| id | bar  | baz  |
+----+------+------+
| 1  | abc  | null |
+----+------+------+
| 2  | null | null |
+----+------+------+
| 3  | 123  | null |
+----+------+------+

instead of the expected:

+----+------+------+
| id | bar  | baz  |
+----+------+------+
| 1  | abc  | null |
+----+------+------+
| 2  | null | xyz  |
+----+------+------+
| 3  | 123  | 456  |
+----+------+------+

It's possible to get around this by using ActiveRecord objects, for example:

arr.map! { |args| Foo.new(args) }
Foo.import arr

but this is still a rather pernicious bug.

points for use of pernicious

Are you using a version older than 0.21.0? I believe this issue was addressed in that release.

Thanks for the response, looks like I'm off by a version. Can this be documented somewhere? Since this use case isn't supported (it raises an exception), I feel like it's a worthy addition to the docs. Maybe here? I realize that isn't what the example is doing, but it doesn't explicitly say that it isn't possible either.

Also, looking through the discussion of how the decision of raising an exception came to be, I'm not sure I understand the concerns with setting the values to column defaults. That is, after all, the default behavior for ActiveRecord (or any insert query where you don't specify the value of a column for that matter) and I think a valid expectation for this gem is that it would do the same. It would nice to have that as an option at least.

I'm happy to create PRs for any of my suggestions if we are in agreement.

Absolutely, this should definitely be documented at that wiki page. For now, an example of how to group group hashes by keys as suggested would be nice as well:

One suggestion that doesn't require a change: The caller could group their hashes by keys and then run multiple import calls, one for each set of keys/columns. This is slightly for work for the caller but puts the caller in the situation where it's clear what their code is intending to do and that the data is expected to not conform to a homogenous set of keys/columns.

See: #465 (comment)

I'll have to give more thought to your suggestion of using the column default. One concern I possibly have is that when using on_duplicate_key_update someone might unintentionally overwrite existing column values.

To make sure I'm understanding correctly - this is a bug that is solved in the latest version and you still want it documented in the README?

This bug was addressed in 0.21.0 by raising an exception instead of silently doing the counter-intuitive behavior I described. My proposal was to document the fact that this use case isn't supported, that it will raise an exception, and that the work-around I talked about should be used instead.

9mm commented

@jkowens is there any possibility of just allowing a default value? (nil)

Part of the entire reason of using this gem is so I DONT have to instantiate tons of unecessary objects, I'm bulk inserting 100's of millions of records. It would be cool to just have it default to nil if its missing so the expected value is null like in OPs issue