Ctrl-c is the wrong key to trigger cancellation
Opened this issue · 4 comments
Pressing ctrl-c when Pegasus is running will also send ctrl-c to the entire foreground process group, thus also sending SIGINT to the ssh
processes.
Instead of waiting for ctrl-c with the tokio ctcl-c catcher, find another way to catch the user's intent to cancel.
Possible solutions:
- Listen to
stdin
for something random. Likeq<Enter>
. - Create a
pegasus stop
command. This should make sure to identify the specificpegasus
process that is running on the current working directory. This will probably require a pid file.- It's okay to assume that there is only one instance of
pegasus
per directory (if not, that means more than onepegasus
processes are manipulatingqueue.yaml
andconsumed.yaml
).
- It's okay to assume that there is only one instance of
I think the best way to fix this is to double fork and create a new session, or run pegasus as a daemon since many users have the habit of using ctrl+c
@NobodyXu Thanks for the advice! I gave it some more thought. One possible solution that popped up is for the ssh
command builder to include this:
pre_exec(|| unsafe { nix::libc::setsid() })
Single-forking would still be okay of ssh
does not try to open("/dev/console")
somewhere.
I think this will make the ssh
processes ctrl-c-proof while still allowing Pegasus to be ergonomic (accepts ctrl-c). But probably this cannot be done without introducing changes in the openssh
crate. WDYT?
@NobodyXu Thanks for the advice! I gave it some more thought. One possible solution that popped up is for the
ssh
command builder to include this:pre_exec(|| unsafe { nix::libc::setsid() })Single-forking would still be okay of
ssh
does not try toopen("/dev/console")
somewhere.I think this will make the
ssh
processes ctrl-c-proof while still allowing Pegasus to be ergonomic (accepts ctrl-c). But probably this cannot be done without introducing changes in theopenssh
crate. WDYT?
Yes that requires modifications to openssh, however I don't think this is a great solution.
That would mean the ssh multiplex master won't exit and release the resource (including the TCP connection).
Next invocation of pegasus would create another ssh multiplex master and TCP connection.
Since pegasus is designed to run tasks on clusters, I think it is more reasonable to run it as a daemon that keep running tasks on the remote until you explicitly terminate it.
Ah, I get your point. It's not easy to guarantee that all the ssh masters terminate when pegasus dies/terminates/is killed/panics.
Making Pegaus a daemon seems good. 👍 I think I'll implement a stop command for termination.