embulk/embulk-input-command

drop GEM_HOME env

Closed this issue · 10 comments

y-ken commented

To execute cruby from this plugin, GEM_HOME env are blocked to load gems in cruby.
In other word, This plugin keeps jruby GEM_HOME but it is unnecessary in usual case.

GEM_HOME=/Users/y-ken/.embulk/jruby/2.3.0

how to find GEM_HOME env

$ cat test.yml
in:
  type: command
  command: env
  parser:
   type: none

$ embulk preview -G test.yml

_2017_03_11_9_46

workarounds

For now, I have override GEM_HOME env to execute cruby

in:
  type: command
  command: GEM_HOME=~/.rbenv/versions/2.1.10/lib/ruby/gems/2.1.0 ruby collect_foo_bar.rb
  parser:
   type: jsonl
   columns:
    ...snip...

cf. http://blog.shibayu36.org/entry/2014/01/15/230015

I think the following option is useful.

in:
  type: command
  restore_gem_env: true
  command: 

However, this option needs to modify the Embulk itself for store original GEM_HOME and GEM_PATH.
That because the Embulk remove the environment variables(GEM_HOME and GEM_PATH).

y-ken commented

It sounds nice to have restore env option!
Restore all of env is better to have selective option and default true.

in:
  type: command
  restore_env: true
  command: 

@y-ken Thanks for filing the issue. I'm a bit concerned that it may have a big impact to change the Embulk core side.

@hiroyuki-sato Do you think that adding an env (or some other name) configuration in embulk-input-command does work?

I mean :

in:
  type: command
  env:
    GEM_HOME: <brabrabra>
  command: <brabrabra>

@dmikurube Thank you for the comment.

IMO, It doesn't need env option now.

Here is the reason.

  • First reason
    • The current workaround work correctly without env parameter.
    • It is mean that the following configuration may work almost the same.
in:
  type: command
  command: GEM_HOME=~/.rbenv/versions/2.1.10/lib/ruby/gems/2.1.0 ruby collect_foo_bar.rb
  parser:
   type: jsonl
   columns:
    ...snip...
  • Second reason
    • In my environment GEM_HOME and GEM_PATH does not define before executing embulk.
    • So Restore mean that unset those variable instead of overriding it.

Totally, this issue has a workaround. So It's not a critical issue.

The Embulk team are planning about next stage.
I recommend the team to refactor(redesign/consider carefully) this behavior.

y-ken commented

I have expected to same behavior with digdag sh operator.
It works to execute ruby command through rbenv. (also, GEM_HOME has not set in my environment)

On the other side, the embulk sets GEM_HOME env.
It is troubles with me.

digdag code is below.

+test:
  sh>: ruby collect_foo_bar.rb

In digdag, the env output does not have GEM_HOME variable. but embulk has it.

+test:
  sh>: env
y-ken commented

Hi @frsyuki
Would you please check this issue for a moment?
It would be better to similar behavior with embulk cmd and digdag sh operator.
If the difference comes from embulk core implements, are there any chance to improvement?

@hiroyuki-sato Thanks for your suggestion!

@y-ken It is very nice to have totally the same configuration on them, but unfortunately they are different products built on different software. (Embulk uses JRuby in its core.) It is hard to guarantee all configurations are 100% identical.

Aren't your and @hiroyuki-sato's workarounds fine for you? If that's fine, please take it atm for a while.

We're planning better gem handling in the Embulk core in a bit long term (~months), not only from this issue. It may fix this issue from its root cause (while I'm not perfectly confident). My concern on the quick hack in the Embulk core was that the hack may conflict with the long-term improvement.

@y-ken Is it fine for you? I'm closing this issue since nothing heard for two weeks. Please reopen if you still have a concern.

@y-ken Jfyi, Embulk does not touch any Gem-related environment variables through its runtime since v0.8.32. If you still have this kind of issues, please try v0.8.32+.

y-ken commented

It works fine! I have confirmed behavior that I expected since v0.8.32+.
Thank you very much!