seamusabshere/data_miner

Is there a way to run a specific step?

Opened this issue · 10 comments

I'm trying to use data_miner to achieve my routine importing jobs. In my case, I need to upload a xls file to my system to import the data from the file.

I have a lot of xls files with different (headers - cols) mappings. I defined import steps for each type of (headers - cols) mappings. So I need to run a specific import step after I upload a xls file. Is there a way to that?

hi @towerhe you may be able to hack it with:

Car.data_miner_script.steps[9].start

it's a known problem with data_miner that this is hard to do - please let me know if you have suggestions!

An import step need a static url which points to a resource. In my case, the url is dynamic. So for achieve my issues, I need to introduce new features to data_miner. But I have problems with running the specs.

I have degraded earth to 0.11.7, minitest to 3.5.0, and minitest-reporters to 0.9.0, but the specs still failed.

Would you please give me a favor on passing the specs?

i hate to say it, but the tests have been neglected for years - they need to be cleaned up.

yeah, I got it. i will have a try to improve it. but I have not any experiences on minitest.

BTW, IMO that the key is not need to an import step. If the key is defined, the records with the provided keys will only be updated. when there is no key defined, data_miner should create new records instead.

If the key is defined, the records with the provided keys will only be updated. when there is no key defined, data_miner should create new records instead.

that should happen already - data_miner uses upsert internally - is that what you needed?

But I have found the following codes:

def start
        if not validate? and (storing_primary_key? or table_has_autoincrementing_primary_key?)
          c = ActiveRecord::Base.connection_pool.checkout
          Upsert.stream(c, model.table_name) do |upsert|
            table.each do |row|
              selector = { @key => attributes[@key].read(row) }
              document = attributes.except(@key).inject({}) do |memo, (_, attr)|
                memo.merge! attr.updates(row)
                memo
              end
              upsert.row selector, document
            end
          end
          ActiveRecord::Base.connection_pool.checkin c
        else
          table.each do |row|
            record = model.send "find_or_initialize_by_#{@key}", attributes[@key].read(row)
            attributes.each { |_, attr| attr.set_from_row record, row }
            record.save!
          end
        end
        refresh
        nil
      end

Both the if block and the else one are need a @key, this means we have to define a key for our models.

ok, i see what you mean - correct, data_miner assumes that it is always in upsert mode.

would your problem be solved if you could just leave out key and have it always insert?

I'm now working hard to fix the tests. After I can pass all the tests, I will try to introduce a method to ignore the key.

@towerhe do you need a gem release before you can close this?

I haven't found a right way to implement this yet.