livebook-dev/livebook

Starting a standalone Elixir runtime fails on Windows 10

juhalehtonen opened this issue · 23 comments

As stated in the title, starting a standalone Elixir runtime / node seems to fail on Windows 10. When attempting to click on the "Connect" button in the Liveview UI, the following error comes up in the terminal:

iex(livebook_j35nq3mw@2080-OMG)1> [error] GenServer #PID<0.461.0> terminating
** (MatchError) no match of right hand side value: {:error, "Elixir process terminated unexpectedly"}
    (livebook 0.1.0) lib/livebook_web/live/session_live/elixir_standalone_live.ex:42: LivebookWeb.SessionLive.ElixirStandaloneLive.handle_event/3
    (phoenix_live_view 0.15.4) lib/phoenix_live_view/channel.ex:338: anonymous fn/3 in Phoenix.LiveView.Channel.view_handle_event/3
    (telemetry 0.4.2) c:/Users/juha/projects/elixir/livebook/deps/telemetry/src/telemetry.erl:262: :telemetry.span/3
    (phoenix_live_view 0.15.4) lib/phoenix_live_view/channel.ex:203: Phoenix.LiveView.Channel.handle_info/2
    (stdlib 3.8) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib 3.8) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib 3.8) proc_lib.erl:259: :proc_lib.wake_up/3
Last message: %Phoenix.Socket.Message{event: "event", join_ref: "32", payload: %{"event" => "init", "type" => "click", "value" => %{"value" => ""}}, ref: "34", topic: "lv:elixir_standalone_runtime"}
State: %{components: {%{}, %{}, 1}, join_ref: "32", serializer: Phoenix.Socket.V2.JSONSerializer, socket: #Phoenix.LiveView.Socket<assigns: %{current_runtime: nil, flash: %{}, live_action: nil, output: nil, session_id: "iamnw6urrqdqeugvmnbby44m3rvl2rr6"}, changed: %{}, endpoint: LivebookWeb.Endpoint, id: "elixir_standalone_runtime", parent_pid: #PID<0.456.0>, root_pid: #PID<0.456.0>, router: LivebookWeb.Router, view: LivebookWeb.SessionLive.ElixirStandaloneLive, ...>, topic: "lv:elixir_standalone_runtime", transport_pid: #PID<0.449.0>, upload_names: %{}, upload_pids: %{}}

The same behaviour can be encountered when calling Livebook.Runtime.ElixirStandalone.init() directly via iex. The process managing the startup of the node seems to immediately return with :DOWN, so something seems to be going wrong in the startup there.

What does work for me on my Windows 10 system is manually starting an "attached node" via iex --sname test.

For the sake of further debugging, I've called IO.inspect([elixir_path: elixir_path, node_name: node_name, eval: eval]) within the Livebook.Runtime.ElixirStandalone.start_elixir_node function when clicking the connect button. The results, as seen in the Windows 10 command prompt, are as follows:

[
  elixir_path: "c:/Program Files (x86)/Elixir/bin/elixir.bat",
  node_name: :"livebook_runtime_3ouxkwak@2OMG",
  eval: "(\n  init_ref = make_ref()\n  parent_process = {:livebook_parent_process_name_ek5c2dts, :\"livebook_j35nq3mw@2080-OMG\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n      manager_ref = Process.monitor(Livebook.Runtime.ErlDist.Manager)\n      receive do\n        {:DOWN, ^manager_ref, :process, _object, _reason} ->\n          :ok\n      end\n  after\n    10000 ->\n      :timeout\n  end\n)"
]

Perhaps the issue is caused by a Windows security policy that prevents certain things from being run here? I don't normally use Windows as my development environment, so I might be way off with my reasoning here.

@juhalehtonen thanks for reporting! What happens if you do this inside IEx on Windows?

iex> System.cmd(System.find_executable("elixir"), ["-e", "IO.inspect :HELLO"])

What happens if you do this inside IEx on Windows?

iex> System.cmd(System.find_executable("elixir"), ["-e", "IO.inspect :HELLO"])
C:\Users\juha>iex
Interactive Elixir (1.11.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> System.cmd(System.find_executable("elixir"), ["-e", "IO.inspect :HELLO"])
{":HELLO\n", 0}

Just for sake of sanity, I tested the same scenario under WSL2. Starting the standalone runtime worked perfectly there.

@juhalehtonen what about this?

iex> eval = "(\n  init_ref = make_ref()\n  parent_process = {:livebook_parent_process_name_ek5c2dts, :\"livebook_j35nq3mw@2080-OMG\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n      manager_ref = Process.monitor(Livebook.Runtime.ErlDist.Manager)\n      receive do\n        {:DOWN, ^manager_ref, :process, _object, _reason} ->\n          :ok\n      end\n  after\n    10000 ->\n      :timeout\n  end\n)"
iex> System.cmd(System.find_executable("elixir"), ["-e", eval])

If that also works, can you please put an IO.inspect here? https://github.com/elixir-nx/livebook/blob/main/lib/livebook/runtime/elixir_standalone.ex#L56 - then copy the args and pass that to System.cmd?

@juhalehtonen what about this?

This seems to return just:

iex(livebook_zghjge7f@2080-OMG)3> eval = "(\n  init_ref = make_ref()\n  parent_process = {:livebook_parent_process_name_ek5c2dts, :\"livebook_j35nq3mw@2080-OMG\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n      manager_ref = Process.monitor(Livebook.Runtime.ErlDist.Manager)\n      receive do\n        {:DOWN, ^manager_ref, :process, _object, _reason} ->\n          :ok\n      end\n  after\n    10000 ->\n      :timeout\n  end\n)"
"(\n  init_ref = make_ref()\n  parent_process = {:livebook_parent_process_name_ek5c2dts, :\"livebook_j35nq3mw@2080-OMG\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n      manager_ref = Process.monitor(Livebook.Runtime.ErlDist.Manager)\n      receive do\n        {:DOWN, ^manager_ref, :process, _object, _reason} ->\n          :ok\n      end\n  after\n    10000 ->\n      :timeout\n  end\n)"
iex(livebook_zghjge7f@2080-OMG)4> System.cmd(System.find_executable("elixir"), ["-e", eval])
{"", 0}

can you please put an IO.inspect here

The existing Port.open does seem to return a Port when piped through IO.inspect:

iex(livebook_zghjge7f@2080-OMG)10> #Port<0.24>
iex(livebook_zghjge7f@2080-OMG)10> [error] GenServer #PID<0.701.0> terminating
** (MatchError) no match of right hand side value: {:error, "Elixir process terminated unexpectedly"}

then copy the args and pass that to System.cmd

What I did here was:

System.cmd(System.find_executable("elixir"), elixir_flags(node_name) ++ ["-e", eval]) |> IO.inspect()

Assuming I got the call right, the System.cmd variant seems to be returning the following:

iex(livebook_zghjge7f@2080-OMG)11> {"", 255}
iex(livebook_zghjge7f@2080-OMG)11> [error] GenServer #PID<0.761.0> terminating
** (ArgumentError) argument error
    :erlang.monitor(:port, {"", 255})
    (livebook 0.1.0) lib/livebook/runtime/standalone_init.ex:94: Livebook.Runtime.StandaloneInit.parent_init_sequence/3

Thanks a lot!

So if it is not the eval, one of the elixir_flags(node_name) is causing it to fail. Can you inspect the flags in particular? Maybe this:

System.cmd(System.find_executable("elixir"), IO.inspect(elixir_flags(node_name) ++ ["-e", eval]))

And then I would try calling the executable multiple times, removing the flags, until we find the problematic one. :)

Thanks José! I tried fiddling with the different flags for a while, but no change in the flags passed seemed to make a difference. I'll get back to this more properly once I find the time, as Windows isn't quite my go-to :)

I started to wonder if this has something to do with the Windows-related issues brought up in https://elixirforum.com/t/port-open-eacces-on-windows/5442 , although they don't quite seem to match either.

@juhalehtonen thank you for the help so far!

I tried fiddling with the different flags for a while, but no change in the flags passed seemed to make a difference

Do you mean they all succeed? FWIW, System.cmd simply calls spawn_executable too, so I don't think the issue is with ports... but maybe we are missing some flags.

@juhalehtonen can you try removing the nouse_stdio flag and see if you have better luck?

can you try removing the nouse_stdio flag and see if you have better luck?

I tried removing the nouse_stdio flag, but that didn't make a difference.

I also went through Livebook.Runtime.StandaloneInit.elixir_flags/1 and changed/removed flags one at a time while recompiling in between, but no luck - the original issue persists.

I also tried to use the cmd.exe /c path syntax as shown on a couple of Elixirforum threads (such as the one I linked), but didn't find a winning combination through that either.

Thanks for the report! This is crazy. :D Just to be clear, can you please try this:

    Port.open({:spawn_executable, elixir_path}, [
      :binary,
      :nouse_stdio,
      :hide,
      args: elixir_flags(node_name) ++ ["--eval", eval]
    ] |> IO.inspect(limit: :infinity))

That will inspect all options. Then go to iex and call the same as above, does it work?

And then if it works, please call Process.monitor(port) and then flush() on IEx. Thank you!

If you want to do a pairing session, let me know and I am definitely game. :)

I replaced Livebook.Runtime.ElixirStandalone.start_elixir_node/3 with yours:

  defp start_elixir_node(elixir_path, node_name, eval) do
    # Here we create a port to start the system process in a non-blocking way.
    Port.open({:spawn_executable, elixir_path}, [
      :binary,
      :nouse_stdio,
      :hide,
      args: elixir_flags(node_name) ++ ["--eval", eval]
    ] |> IO.inspect(limit: :infinity))
  end

and then tried starting the runtime again. Note that here I had reverted any previous changes, using the latest from master sans the function change above:

iex(livebook_lbxd4ah4@2080-OMG)5> Livebook.Runtime.ElixirStandalone.init()
[
  :binary,
  :nouse_stdio,
  :hide,
  {:args,
   ["--sname", "livebook_runtime_mz6gwvii@2080-OMG", "--erl",
    "+sbwt none +sbwtdcpu none +sbwtdio none -elixir ansi_enabled true",
    "--hidden", "--cookie", "OXMOCVLXFIYDLMVPVMYE", "--eval",
    "(\n  init_ref = make_ref()\n  parent_process = {:livebook_parent_process_name_fruxc6ha, :\"livebook_lbxd4ah4@2080-OMG\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n      manager_ref = Process.monitor(Livebook.Runtime.ErlDist.Manager)\n      receive do\n        {:DOWN, ^manager_ref, :process, _object, _reason} ->\n          :ok\n      end\n  after\n    10000 ->\n      :timeout\n  end\n)"]}
]
{:error, "Elixir process terminated unexpectedly"}

Removing the :nouse_stdio flag, for example, results in the same outcome (this time called via the UI):

iex(livebook_lbxd4ah4@2080-OMG)4> [
  :binary,
  :hide,
  {:args,
   ["--sname", "livebook_runtime_zhbwy2t5@2080-OMG", "--erl",
    "+sbwt none +sbwtdcpu none +sbwtdio none -elixir ansi_enabled true",
    "--hidden", "--cookie", "OXMOCVLXFIYDLMVPVMYE", "--eval",
    "(\n  init_ref = make_ref()\n  parent_process = {:livebook_parent_process_name_j2qly75k, :\"livebook_lbxd4ah4@2080-OMG\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n      manager_ref = Process.monitor(Livebook.Runtime.ErlDist.Manager)\n      receive do\n        {:DOWN, ^manager_ref, :process, _object, _reason} ->\n          :ok\n      end\n  after\n    10000 ->\n      :timeout\n  end\n)"]}
]
iex(livebook_lbxd4ah4@2080-OMG)4> 09:38:04.304 [error] GenServer #PID<0.518.0> terminating
** (MatchError) no match of right hand side value: {:error, "Elixir process terminated unexpectedly"}

There has to be something very Windows-specific that's being missed here. Probably something very obvious-in-hindsight too :).

If you want to do a pairing session, let me know and I am definitely game. :)

I'm open to having a look at this together as well.

I am sending you an email. :D

Here is a minimal example that isolates that invocation and we can run on IEx without Livebook:

$ iex --sname foo

then

Process.register(self(), :hello)

port = Port.open(
  {:spawn_executable, System.find_executable("elixir")},
  [
    :binary,
    :nouse_stdio,
    :hide,
    {:args,
     ["--sname", "prefix-#{node()}", "--erl",
      "+sbwt none +sbwtdcpu none +sbwtdio none -elixir ansi_enabled true",
      "--hidden", "--cookie", Atom.to_string(Node.get_cookie()), "--eval",
      "(\n  init_ref = make_ref()\n  parent_process = {:hello, :\"#{node()}\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n    :ok\n  after\n    10000 -> :timeout\n  end\n)"]}
  ]
)

Port.monitor(port)
flush()

@juhalehtonen, thank you so much for the pairing session! ❤️ 🙌 I have sent #201 that should address the issue, can you please try it out?

We found out the issue was due to the differences on how Unix and Windows handle escaped strings. The solution I found was to pass the injected parameters as argv instead of "interpolating" them.

Thanks José! Unfortunately the underlying issue still seems to persist:

# For sake of sanity, pointing out correct branch

C:\Users\juha\projects\elixir\livebook (jv-windows -> origin)
λ iex -S mix phx.server
Compiling 4 files (.ex)
[Livebook] Application running at http://localhost:8080/?token=ipfloeqraju4wapyji2m6rzz7bszp3o6
Interactive Elixir (1.11.4) - press Ctrl+C to exit (type h() ENTER for help)

# Triggered via UI

iex(livebook_kalmhskk@2080-OMG)1> 11:11:36.895 [error] GenServer #PID<0.521.0> terminating
** (MatchError) no match of right hand side value: {:error, "Elixir process terminated unexpectedly"}
    (livebook 0.1.0) lib/livebook_web/live/session_live/elixir_standalone_live.ex:42: LivebookWeb.SessionLive.ElixirStandaloneLive.handle_event/3
    (phoenix_live_view 0.15.4) lib/phoenix_live_view/channel.ex:338: anonymous fn/3 in Phoenix.LiveView.Channel.view_handle_event/3
    (telemetry 0.4.2) c:/Users/juha/projects/elixir/livebook/deps/telemetry/src/telemetry.erl:262: :telemetry.span/3
    (phoenix_live_view 0.15.4) lib/phoenix_live_view/channel.ex:203: Phoenix.LiveView.Channel.handle_info/2
    (stdlib 3.8) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib 3.8) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib 3.8) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: %Phoenix.Socket.Message{event: "event", join_ref: "18", payload: %{"event" => "init", "type" => "click", "value" => %{"value" => ""}}, ref: "19", topic: "lv:elixir_standalone_runtime"}

# Triggered via iex

iex(livebook_kalmhskk@2080-OMG)1> Livebook.Runtime.ElixirStandalone.init()
{:error, "Elixir process terminated unexpectedly"}

Oh noes, can you do another pairing session later on?

In any case, can you try inspecting System.argv() here?

https://github.com/elixir-nx/livebook/pull/201/files#diff-17dab55a55952f1aff5b2e07aee22b5da046af9c365ecaaf88cbbdf7be943f31R128

Then call Livebook.Runtime.ElixirStandalone.init() and flush(). :)

And here is an updated version that isolates it:

Process.register(self(), :"hello-#{node()}")

port = Port.open(
  {:spawn_executable, System.find_executable("elixir")},
  [
    :binary,
    :hide,
    {:args,
     ["--sname", "hello-#{node()}", "--erl",
      "+sbwt none +sbwtdcpu none +sbwtdio none -elixir ansi_enabled true",
      "--hidden", "--cookie", Atom.to_string(Node.get_cookie()), "--eval",
      "(\n  [parent_node] = System.argv()\n  init_ref = make_ref()\n  parent_process = {node(), String.to_atom(parent_node)}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n    :ok\n  after\n    10000 -> :timeout\n  end\n)",
      "--",
      Atom.to_string(node())]}
  ]
)

Port.monitor(port)
flush()

And here is an updated version that isolates it:

Running the isolated code gives us :DOWN with a state of :noproc.

C:\Users\juha
λ iex --sname foobar
Interactive Elixir (1.11.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(foobar@2080-OMG)1> Process.register(self(), :"hello-#{node()}")
true
iex(foobar@2080-OMG)2> port = Port.open(
  {:spawn_exec...(foobar@2080-OMG)2> utable, System.find_executable("elixir")},
  [
...(foobar@2080-OMG)2>  ...(foobar@2080-OMG)2>    :binary,
    ...(foobar@2080-OMG)2> :hide,
    {:arg...(foobar@2080-OMG)2> s,
     ["--snam...(foobar@2080-OMG)2> e", "hello-#{node()}", "--erl",
...(foobar@2080-OMG)2>       "+sbwt none +sbwtdcpu none +sbwtdio none -elixir ansi_enabled true",
     ...(foobar@2080-OMG)2>  "--hidden", "--cookie", Atom.to_string(Node.get_cookie()), "--eval",
      "(\n...(foobar@2080-OMG)2>   [parent_node] = System.argv()\n  init_ref = make_ref()\n  parent_process = {node(), String.to_atom(parent_node)}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n
  receive do\n    {:node_initialized, ^init_ref} ->\n    :ok\n  after\n    10000 -> :timeout\n  end\n)",
...(foobar@2080-OMG)2>       "--",
...(foobar@2080-OMG)2>       Atom.to_string(node())]}
  ]
...(foobar@2080-OMG)2> )...(foobar@2080-OMG)2>
#Port<0.7>
iex(foobar@2080-OMG)3> Port.monitor(port)
#Reference<0.3842356013.1485570050.111182>
iex(foobar@2080-OMG)4> flush()
{:DOWN, #Reference<0.3842356013.1485570050.111182>, :port, #Port<0.7>, :noproc}
:ok
iex(foobar@2080-OMG)5> flush()
:ok

can you do another pairing session later on?

I have some free time during the next two hours or so, if that works for you :)

@juhalehtonen sounds good, i will be waiting on the same place!

The second issue was around newlines but it is addressed now. Thank you @juhalehtonen <3 and Windows should be good to go @samaaron!

This issue seems back again:
F:\livebook>elixir --cookie "COOKIEFORTESTS" -S mix test
.................................................

  1. test Runtime.disconnect/1 makes the node terminate (Livebook.Runtime.ElixirStandaloneTest)
    test/livebook/runtime/elixir_standalone_test.exs:32
    match (=) failed
    code: assert {:ok, %{node: node} = runtime} = Runtime.ElixirStandalone.init()
    left: {:ok, %{node: node} = runtime}
    right: {:error, "Elixir process terminated unexpectedly"}
    stacktrace:
    test/livebook/runtime/elixir_standalone_test.exs:33: (test)