postmodern/chruby

chruby uses the same GEM_HOME for different non-CRuby releases

eregon opened this issue · 11 comments

From oracle/truffleruby#1723, but posted here for clarity since it is a bug of chruby.

The bug is addressed with #419, but that is not yet released.

@postmodern This still happens regularly for people using truffleruby + chruby.
Do you think we could have a chruby 0.x.y release that fixes this issue if chruby 1.0.0 is not going to be released soon?
If that's deemed too incompatible, maybe #410 would be a better fix for the 0.x.y line, that only changes GEM_HOME for non-CRuby and still fixes this immediate bug?

Note that rubygems compartmentalizes C extensions based on the platform, Ruby's ABI version, and whether the target Ruby was compiled with --enable-shared or not.

Under CRuby 2.7.2 and chruby 0.3.9, the directory structure looks like this:

.gem/ruby/2.7.2/extensions/
└── x86_64-linux
    └── 2.7.0-static # 2.7.0 is the ABI version, and -static denotes that Ruby was not compiled with --enable-shared
        └── ffi-1.13.1
            ├── ffi_c.so
            ├── gem.build_complete
            ├── gem_make.out
            └── mkmf.log

For TruffleRuby and chruby 0.3.9, the directory structure looks like this:

.gem/truffleruby/2.6.5/extensions
└── x86_64-linux
    └── 19.3.1
        └── unf_ext-0.0.7.7
            ├── gem.build_complete
            ├── gem_make.out
            ├── mkmf.log
            └── unf_ext.so
.gem/truffleruby/2.6.6/extensions # Note: TruffleRuby 20.2.0 and 20.3.0 share the same RUBY_VERSION o 2.6.6.
└── x86_64-linux
    ├── 20.2.0 # Note the ABI version appears to be the TruffleRuby version, which implies an unstable ABI
    │   └── unf_ext-0.0.7.7
    │       ├── gem.build_complete
    │       ├── gem_make.out
    │       ├── mkmf.log
    │       └── unf_ext.so
    └── 20.3.0
        └── unf_ext-0.0.7.7
            ├── gem.build_complete
            ├── gem_make.out
            ├── mkmf.log
            └── unf_ext.so

Assuming that the TruffleRuby versions do not break/update the CRuby ABI and were not compiled with different configuration options, extensions should be able to be re-used. If the TruffleRuby versions have different ABI versions or different configuration options, then gem pristin --extensions can be ran to re-compile all C extensions, as RubyGems does not yet support compiling C extensions on-demand. If TruffleRuby plans on breaking the CRuby ABI or bumping the ABI version too often, than that sounds like an upstream issue that is not the responsibility of chruby to work-around; ideally the ABI version should be separate from the release version, and only change on Major version increments.

Also note that #419 is an API breaking change, as it changes where gems are installed/loaded, thus causing users to lose any previously installed gems. Thus #419 will be released in 1.0.0. While I had reconsidered #410, it does not help with the absolute #!/path/to/ruby issue caused by RubyGems not enabling --env-shebang behavior by default. Work on the 1.0.0 branch and new more-customizable API is ongoing.

This appears to be a duplicate of #419, which has already been discussed and merged. Please do not knowingly submit duplicates. Link to the original issue instead.

Thank you for looking at this in details!
You're right that recent RubyGems appear to at least account for RbConfig::CONFIG['version'] (the ABI version), but see below.

Here is the actual error I noticed from someone else, and rm -rf ~/.gem/truffleruby/2.7.2 did solve it:

bundler: failed to load command: puma (/home/user/.gem/truffleruby/2.7.2/bin/puma)
RuntimeError: Invalid native function pointer (LLVMNativePointerException)
define.c:26:in `rb_define_module_under'
	from define.c:22:in `rb_define_module'
	from /home/user/.gem/truffleruby/2.7.2/gems/puma-3.11.4/ext/puma_http11/puma_http11.c:474:in `Init_puma_http11'

2.7.2 there is confusing because it's neither TruffleRuby's version, nor the ABI version, but that's expected with chruby 0.3.9.

I'm not entirely sure yet what's the actual reason, but it seems very likely that an old .so is used for a different TruffleRuby build.

I found something:

$ ruby-build truffleruby-dev ~/.rubies/truffleruby-dev 
$ chruby truffleruby-dev
$ cd ~/.rubies/truffleruby-dev/lib/gems
$ find . -name '*.so'
# nothing
$ gem i json         
Fetching json-2.4.1.gem
Building native extensions. This could take a while...
Successfully installed json-2.4.1
1 gem installed
$ find . -name '*.so'
./extensions/x86_64-linux/21.0.0-dev-e020c89a/json-2.4.1/json/ext/generator.so
./extensions/x86_64-linux/21.0.0-dev-e020c89a/json-2.4.1/json/ext/parser.so
./gems/json-2.4.1/lib/json/ext/generator.so
./gems/json-2.4.1/lib/json/ext/parser.so
./gems/json-2.4.1/ext/json/ext/parser/parser.so
./gems/json-2.4.1/ext/json/ext/generator/generator.so

And it seems to be the same on CRuby: https://gist.github.com/eregon/c07308a479f3732d8b096014451aaf76
The output is long, the point is there are many .so not under extensions, i.e., all of them seems directly under ~/.rubies/ruby-2.7.2/lib/ruby/gems/2.7.0/gems too.

Also, require uses the .so not under extensions:

$ ruby -rffi -e 'puts $".grep(/ffi.+\.so/)'
/home/eregon/.rubies/ruby-2.7.2/lib/ruby/gems/2.7.0/gems/ffi-1.13.1/lib/ffi_c.so

So, the compartmentalization of extensions of RubyGems by ABI version doesn't seem to work at all.
And therefore I think this is still a issue.

Regardless of that, it seems far safer to actually have separate gem homes per Ruby installation.
RUBY_VERSION as used in chruby 0.3.9 is a not a good approximation of ABI version.

FWIW, the ABI version of TruffleRuby is equivalent to the RUBY_ENGINE_VERSION, and it's different for every commit, so it's always correct.

This appears to be a duplicate of #419, which has already been discussed and merged. Please do not knowingly submit duplicates. Link to the original issue instead.

This is an issue, an I linked the related PRs in the description.
I wanted to file an issue to make it clear this is still an unsolved issue in chruby (even though there is a PR solving it, but not yet released), and to also express more clearly what is the issue.

The --env-shebang issue is for me one more example that separate gem homes per Ruby installation is by far the safest and most reliable approach.
That is also what the default gem home guarantees, but chruby unfortunately sets GEM_HOME and breaks that guarantee.

That looks like an ABI breakage combined with a shared GEM_HOME. ext/puma_http11/puma_http11.c:474 from puma-3.11.4 is a call to rb_define_module, however the error mentions rb_define_module_under. Since puma_http11.so is a shared library, it does not contain hardcoded addresses for the CRuby C functions that it calls, instead it has empty place-holder symbols for them which are dynamically resolved once the shared library is loaded into the Ruby process. I am guessing this C function was added to a later version of TruffleRuby?

$  nm -D extensions/x86_64-linux/20.3.0/puma-3.11.4/puma/puma_http11.so  | grep rb_define_module_under
                 U rb_define_module_under

Would it be possible for TruffleRuby to define it's own CRuby ABI version and override RubyGems to use it, instead of RbConfig::CONFIG['version'] / RUBY_ENGINE_VERSION, and only incremented when CRuby C functions/types/structs are added or changed? While this would not protect against every corner-case of shared $GEM_HOMEs between TruffleRuby versions, but it would make the directory structure more correct and provide some protection against ABI breakages. Having a unique ABI version for every release, even if nothing in the CRuby C ABI has changed, would make it difficult to upgrade TruffleRuby in-place (like via a package manager) without running gem pristin --extensions on any pre-installed gems (regardless they were installed into the GEM_ROOT or a GEM_HOME); this is also why ABI versions were created, to provide binary compatibility and continuity between version releases.

So, the compartmentalization of extensions of RubyGems by ABI version doesn't seem to work at all.
And therefore I think this is still a issue.

Ah ha. So, ext/ is usually added to the gemspec's require_paths so that require can find it. I seem to remember older C extensions would copy the compiled .so into lib/. I am not entirely sure where or how the ffi gem installs it's .so or why it's missing from the usual extensions/ sub-directory. Since it's somehow getting installed into TruffleRuby's GEM_ROOT, there shouldn't be any risk of it accidentally being shared with another TruffleRuby.

I'm not interested in taking more risk about ABI breakage, the current solution works well, and it is always correct, proving this is an issue of chruby or some invalid gem sharing, and at least not of TruffleRuby's ABI version.

rb_define_module_under exists in TruffleRuby since a long time, and it's what rb_define_module() calls: https://github.com/oracle/truffleruby/blob/bfa9b9813f6f4602d24d9ccb897062a054b9cb39/src/main/c/cext/define.c#L21-L23

I'm not sure if the existing puma_http11.so was compiled against CRuby or an older TruffleRuby, but neither should happen anyway. I would think older TruffleRuby, and then correctly using chruby someruby before installing any gems could still lead to this bug due to the shared GEM_HOME based on RUBY_VERSION and not the ABI version.

Maybe TruffleRuby should define some global variable in the C extension and set it to the ABI version, then it could check before running Init_myext. It would make the error clearer.

I am not entirely sure where or how the ffi gem installs it's .so or why it's missing from the usual extensions/ sub-directory.

It's not just ffi, it's all gems it seems (see the gist).

Since it's somehow getting installed into TruffleRuby's GEM_ROOT, there shouldn't be any risk of it accidentally being shared with another TruffleRuby.

That's because I ran it with my fork of chruby which does not set GEM_HOME. If GEM_HOME is set, I'm pretty sure the same problem happens, and due to having GEM_HOME to the same value (by chruby) for different TruffleRuby builds/release, you'll get gem sharing issues.

I think the only reliable solution to this issue, in that it also works for head versions and when the given ruby basename can over time point to different Rubies, is not setting GEM_HOME (#431).
#419 actually doesn't fix it for e.g. ruby-head or mytruffleruby-dev-build, #410 and using the ABI version probably would but seems less convenient and could still lead to unwanted sharing (e.g. static vs non-static).

@postmodern It's been 18 months since this bug in chruby is known and the PR was merged, could you please do a release soon including that fix?
It is a very serious bug as it causes invalid gem sharing. TruffleRuby users are hitting this issue regularly.

If you don't have time to maintain chruby, I would be happy to become a maintainer, merge #431 and release 1.0.

Maybe the issue isn't so clear, so I'll make a quick summary:
https://twitter.com/eregontp/status/1401144053979361284

I'm using truffleruby-dev from ruby-build truffleruby-dev ~/.rubies/truffleruby-dev.

Paths from gem env with no GEM_HOME, both correct:
~/.rubies/truffleruby-dev/lib/gems
~/.gem/truffleruby/2.7.3.1

Paths with chruby 0.3.9, first is incorrect:
~/.gem/truffleruby/2.7.3
~/.rubies/truffleruby-dev/lib/gems

RbConfig::CONFIG["ruby_version"] is 2.7.3.1, RUBY_VERSION 2.7.3

It's very similar for a release, but then the ABI version is simply the release version:

Paths from gem env with no GEM_HOME, both correct:
~/.rubies/truffleruby-21.1.0/lib/gems
~/.gem/truffleruby/21.1.0

Paths with chruby 0.3.9, first is incorrect:
~/.gem/truffleruby/2.7.2
~/.rubies/truffleruby-21.1.0/lib/gems

RbConfig::CONFIG["ruby_version"] is 21.1.0, RUBY_VERSION 2.7.2

Paths from gem env with no GEM_HOME, both correct:
~/.rubies/truffleruby-dev/lib/gems
~/.gem/truffleruby/2.7.3.1

Paths with chruby 0.3.9, first is incorrect:
~/.gem/truffleruby/2.7.3
~/.rubies/truffleruby-dev/lib/gems

RbConfig::CONFIG["ruby_version"] is 2.7.3.1, RUBY_VERSION 2.7.3

It's very similar for a release, but then the ABI version is simply the release version:

Paths from gem env with no GEM_HOME, both correct:
~/.rubies/truffleruby-21.1.0/lib/gems
~/.gem/truffleruby/21.1.0

Paths with chruby 0.3.9, first is incorrect:
~/.gem/truffleruby/2.7.2
~/.rubies/truffleruby-21.1.0/lib/gems

RbConfig::CONFIG["ruby_version"] is 21.1.0, RUBY_VERSION 2.7.2

@eregon you seem to be confusing the GEM_HOME with the extensions/ directory structure that exists within the GEM_HOME. The way extensions/ is structured isolates compiled C extensions by whether the ruby is dynamically linked or statically linked and the RbConfig::CONFIG["ruby_version"]. This isolation should prevent any ABI conflicts.

extensions/ is useless and a lie, see #451 (comment).
Also .so files are still under GEM_HOME/gems/name-version/lib, even on CRuby.

extensions/ is useless and a lie, see #451 (comment).
Also .so files are still under GEM_HOME/gems/name-version/lib, even on CRuby.

Ah I had forgotten about the weird copying of the built .so/.dylib behavior in RubyGems. In that case, it really seems like this is a problem in RubyGem's handling of C extensions. RubyGems clearly needs a better C extension isolation story, since it's failing to properly isolate built C extensions by Ruby ABI version. If the user installs a nightly build of ruby or compiles ruby from HEAD, then I think it is the user's responsibility to run gem pristin --extensions to rebuild any C extensions which may have broken due to ABI changes; although I would prefer RubyGems handled this automatically for users. Also, the problem you describe could also happen if a user explicitly sets GEM_HOME, without using chruby, and upgrades a previously installed version of truffleruby, so I don't think this is really specific to chruby. It is chruby's explicit policy to not accept Ruby or OS specific workarounds, so chruby does not enable upstream to become lazy and to force them to take responsibility for their own bugs. I also think that you are best positioned to influence the development of RubyGems for the better, given your status as both a TruffleRuby and CRuby committer.

That said, I believe #419 which has already been merged and slated for 1.0.0 sufficiently solves the issue of sharing GEM_HOME between rubies.

For the record, it seems I'm not the only one to have such issues with invalid gem sharing in chruby: https://twitter.com/schneems/status/1401248352562515968 (I moved the comment, it was wrongly on the PR #431).
The same issue applies when reinstalling ruby-head/ruby-dev too BTW.

For the record, it seems I'm not the only one to have such issues with invalid gem sharing in chruby: https://twitter.com/schneems/status/1401248352562515968 (I moved the comment, it was wrongly on the PR #431).
The same issue applies when reinstalling ruby-head/ruby-dev too BTW.

Part of the miscommunication appears to be that you are using the experiences of users with chruby 0.3.9, but not #419 or other 1.0.0 changes which have not been released yet, to justify further changes. This ignores the fact that #419 does in fact provide isolation of user GEM_HOME directories between chruby rubies and would address user's issues with gem sharing between pre and rc releases under chruby 0.3.9.