fxn/zeitwerk

Zeitwerk

adfoster-r7 opened this issue ยท 6 comments

Hi there; Thanks for the project ๐Ÿ‘

We spotted this stacktrace on our CI setup that we've never seen before:

wrong number of arguments (given 0, expected 2)

Call stack:
/Users/runner/work/metasploit-framework/metasploit-framework/vendor/bundle/ruby/3.0.0/gems/zeitwerk-2.6.8/lib/zeitwerk/loader/helpers.rb:118:in `cpath'
/Users/runner/work/metasploit-framework/metasploit-framework/vendor/bundle/ruby/3.0.0/gems/zeitwerk-2.6.8/lib/zeitwerk/loader/callbacks.rb:12:in `on_file_autoloaded'
/Users/runner/work/metasploit-framework/metasploit-framework/vendor/bundle/ruby/3.0.0/gems/zeitwerk-2.6.8/lib/zeitwerk/kernel.rb:31:in `require'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/command_dispatcher/modules.rb:769:in `cmd_use'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/dispatcher_shell.rb:581:in `run_command'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/dispatcher_shell.rb:530:in `block in run_single'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/dispatcher_shell.rb:524:in `each'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/dispatcher_shell.rb:524:in `run_single'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/shell.rb:168:in `run'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/metasploit/framework/command/console.rb:48:in `start'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/metasploit/framework/command/base.rb:82:in `start'
msfconsole:23:in `<main>'

Unfortunately I don't have specific replication steps, it's the first time I've seen the issue in thousands of CI runs. I'm guessing the cref returned nil here:

def on_file_autoloaded(file)
cref = autoloads.delete(file)
cpath = cpath(*cref)

I noticed the other on_dir_autoloaded code has a surrounding mutex, so maybe something similar is missing here - not sure. Potentially adding a guard clause/error for this specific scenario might be useful to help with future debugging

fxn commented

The difference is that on_file_autoloaded is only invoked if a require on a managed file returned true (here), and Ruby guarantees that if multiple threads are loading the same file, only one of them will succeed.

Do you have a way to reproduce or any other insights? Is this a multi-threaded application?

I'm afraid as it's only happened once I don't have any additional datapoints; here's all the information I have:

    operating-system: macOS-11.7.7
    ruby-version: 3.0.2
    bundler: 2.2.33
    zeitwerk: 2.6.8

The application does make use of threads, and user interaction via a readline prompt i.e. reacting to user input, which then attempted to load additional ruby files.

From the CI output I can see that the series of events were:

  1. Application boots up and creates separate threads which might load ruby files
  2. User input is handled (potentially overlapping with the above threads?), which attempts to load Msf::Ui::Console::CommandDispatcher::Payload which zeitwerk crashed on
  3. Based on the later logs - the constant loading seems to be 'poisoned', as Msf::Ui::Console::CommandDispatcher::Payload doesn't load again - or at least it seems like zeitwerk doesn't try again, based on the same crash not happening

I based the third point on seeing future calls to the same failure line in the CI output giving a normal uninitialized constant exception when attempting to load the same constant

              when Msf::MODULE_PAYLOAD
                dispatcher = Msf::Ui::Console::CommandDispatcher::Payload

Source: https://github.com/adfoster-r7/metasploit-framework/blob/d40d90d15e0ff1a0962dbf4d4933478fc5ce0a03/lib/msf/ui/console/command_dispatcher/modules.rb#L769

Error:

[read] [-] Error while running command use: uninitialized constant Msf::Ui::Console::CommandDispatcher::Payload
Did you mean?  Msf::Payload

Call stack:
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/command_dispatcher/modules.rb:769:in `cmd_use'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/dispatcher_shell.rb:581:in `run_command'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/dispatcher_shell.rb:530:in `block in run_single'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/dispatcher_shell.rb:524:in `each'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/dispatcher_shell.rb:524:in `run_single'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/shell.rb:168:in `run'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/metasploit/framework/command/console.rb:48:in `start'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/metasploit/framework/command/base.rb:82:in `start'
msfconsole:23:in `<main>'


I'm not sure if it makes sense to raise a better error to help with debugging this scenario in the future - feel free to close the issue if there's nothing immediately actionable from this issue. I'll keep an eye open to see if it happens again, just wanted to raise the issue for visibility - thanks! ๐Ÿ‘

Just another datapoint ๐Ÿ‘

Hit this again on another CI run, this time failing on a top level constant:

module Msf
module Ui
module Console

module ModuleCommandDispatcher

  include Msf::Ui::Console::CommandDispatcher
  include Msf::Ui::Console::ModuleArgumentParsing <---

https://github.com/rapid7/metasploit-framework/blob/36e0d8f9159748de4e7202939d7ea767c3f9e798/lib/msf/ui/console/module_command_dispatcher.rb#L15

Stack trace:

Error while running command use: wrong number of arguments (given 0, expected 2)

Call stack:
/Users/runner/work/metasploit-framework/metasploit-framework/vendor/bundle/ruby/3.0.0/gems/zeitwerk-2.6.8/lib/zeitwerk/loader/helpers.rb:118:in `cpath'
/Users/runner/work/metasploit-framework/metasploit-framework/vendor/bundle/ruby/3.0.0/gems/zeitwerk-2.6.8/lib/zeitwerk/loader/callbacks.rb:12:in `on_file_autoloaded'
/Users/runner/work/metasploit-framework/metasploit-framework/vendor/bundle/ruby/3.0.0/gems/zeitwerk-2.6.8/lib/zeitwerk/kernel.rb:31:in `require'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/module_command_dispatcher.rb:15:in `<module:ModuleCommandDispatcher>'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/module_command_dispatcher.rb:12:in `<module:Console>'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/module_command_dispatcher.rb:5:in `<module:Ui>'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/module_command_dispatcher.rb:4:in `<module:Msf>'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/module_command_dispatcher.rb:3:in `<top (required)>'
/Users/runner/work/metasploit-framework/metasploit-framework/vendor/bundle/ruby/3.0.0/gems/bootsnap-1.16.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
/Users/runner/work/metasploit-framework/metasploit-framework/vendor/bundle/ruby/3.0.0/gems/bootsnap-1.16.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
/Users/runner/work/metasploit-framework/metasploit-framework/vendor/bundle/ruby/3.0.0/gems/zeitwerk-2.6.8/lib/zeitwerk/kernel.rb:30:in `require'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/command_dispatcher/payload.rb:11:in `<class:Payload>'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/command_dispatcher/payload.rb:10:in `<module:CommandDispatcher>'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/command_dispatcher/payload.rb:6:in `<module:Console>'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/command_dispatcher/payload.rb:5:in `<module:Ui>'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/command_dispatcher/payload.rb:4:in `<module:Msf>'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/command_dispatcher/payload.rb:3:in `<top (required)>'
/Users/runner/work/metasploit-framework/metasploit-framework/vendor/bundle/ruby/3.0.0/gems/bootsnap-1.16.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
/Users/runner/work/metasploit-framework/metasploit-framework/vendor/bundle/ruby/3.0.0/gems/bootsnap-1.16.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
/Users/runner/work/metasploit-framework/metasploit-framework/vendor/bundle/ruby/3.0.0/gems/zeitwerk-2.6.8/lib/zeitwerk/kernel.rb:30:in `require'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/msf/ui/console/command_dispatcher/modules.rb:769:in `cmd_use'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/dispatcher_shell.rb:581:in `run_command'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/dispatcher_shell.rb:530:in `block in run_single'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/dispatcher_shell.rb:524:in `each'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/dispatcher_shell.rb:524:in `run_single'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/rex/ui/text/shell.rb:168:in `run'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/metasploit/framework/command/console.rb:48 in `start'
/Users/runner/work/metasploit-framework/metasploit-framework/lib/metasploit/framework/command/base.rb:82:in `start'
msfconsole:23:in `<main>'

Environment:

    Image: macos-11
    ruby-version: 3.0.2
    bundler: 2.2.33

If there's any extra logging I could add to help aid with debugging the issue I'd be happy to wire that up - unfortunately I don't know enough about zeitwerk's internals to jump to a solution just yet ๐Ÿ˜„

fxn commented

How strange!

I believe it is good that it happens in CI, because normally execution is more constrained in there. Sometimes, not even multi-threaded. Is your test suite in CI multi-threaded?

Something that could help is logging the activity of the loader by issuing loader.log! (docs about logging).

Is that the CI of a Rails application or of a library? Do you know where is the loader instantiated?

fxn commented

I dislike closing issues without understanding their root cause. The root cause here could be Zeitwerk, but could also be Ruby, or Metasploit. We just don't know.

On paper, that callback is protected by require. If Zeitwerk had a threading issue, it is unlikely that apps are not hitting it and is showing up in the issue tracker.

We don't have a way to reproduce or feedback that could help me guide debugging.

So, by now I'll close. If there's extra data some day, please do not hesitate to followup.

Agreed, thanks! Will keep an eye out for this in the future, and hopefully I'll catch it with the Zeitwerk debugger enabled one day to shed more light here ๐Ÿ‘