supervisor running|monitoring all these commands
zap-demo.mp4
Runs and watches commands specified in zap.py, for bringing up flocks of programs, some management.
- can fixup common errors with the commands
- curses interface which becomes
less -R
(supports ansii colours) when looking in to job output (not a proper terminal), Ctrl+C twice to return from it. - press R to restart a job
- Jobs (of commands) are grouped into systems, so individual
./zap.py $system
can be run, or most will be. - all jobs at once, each job's commands in sequence (joined by &&)
Instead of a village of terminals to constantly rebuild.
Not sure. I have only heard of "the supervisor process" being a project-bound minimalist perl script or so.
Keen on the idea of healing a command with another command automatically. Computers should have developed a receptor for gathering this knowledge by now, so here it is.
systemd ~~ zap for LinuxSystem clutter: switches services on, might monitor, not sure.
tmux ~~ a bunch of terminal sessions to show you, predefined and instantiated.
kubernetes ~~ zap orchestrating vms with ports and mounts is like infrastructure as code.
- copy somewhere
- adjust the commands specified in
zap.py
for yourself- various podman-build are implied
A terminal program to automate running and monitoring all these commands, having fixups for common error-response side-tracking.
It present an interface (curses) listing things running normally or catching fire.
Run a bunch of commands at once, tailing the outputs.
Sysadmin glue code for clusters of vms with various sshfs etc commands involved in their running.
In zap.py
is configured several sequences of commands,
the first is usually an ssh session the rest happen in,
and we wait observing the command forever, unless it says %restart
Please look to the wiki to pick up or drop off more examples. Think of it as social media for software.
When the UI shells out to less and we need to use Ctrl+C to exit it, this SIGINT affects lsyncd and serve.pl also (they auto-restart), despite being inside zap_run.pl which supposedly handles that signal.
We shell out to less for lack of putting ascii colour codes in curses.
The podman-run|ipfs commands seem to exit(125|0) while they're still outputting stuff. who knows why? This means they can't %restart, because they always seem to need restarting.
These commands are left running when zap exits, reparented to /lib/systemd/systemd --user
. Via fixups they won't get in the way of a new zap instance.
Needs knowledge. This maybe? Ideally we handle Ctrl+C for zap exit and end jobs smoothly.
It now takes multiple Ctrl+C to kill zap et al and your terminal may need a reset
.
Even knowing the process tree to track cpu|mem would be nice.
Periodically truncates job.output to 6000 lines.
See TODO / clear exit code on restart, the UI may have stale exit codes.
For more caveats, see zap.py
lots of nice Console-UI things are over there
if looking at job output. in the job list they do say eg exit(-2).
The fixups can't be gauged for failure, eg podman rm -f cos1
often errs "container has already been removed", meaning good.
Python errors are obscured by curses but apparently this will get them, avail them as a job.
restart job 2x daily: memleak mode
Find out how to efficiently port forward (without SSH -L
)
Could need A finished before B starts. perhaps the cmd_source sets or gets markers. do %early before others?
Achieve job.ready=True when exit(0) or eg lsyncd could have a fixup notice when it says Ready, since it runs forever.
Workaround: sleep 1
before the %later jobs. Potential very elsewhere to run the dev server while still copying its files into place for the first time.
make the system/job/cmd hierarchy clearer, toggle them on|off...
more indicators, eg that unseen output|errors exist per job
while in the less of less_job()
, indication of action in other jobs should be visible. lose less.
getting the pid of the remote shell, then with ps faux
, figure which cmd in the job we are doing, take cpu|mem stats, etc.
we currently dont know which cmd of a job failed, eg ssh n 'any && of && these && coulda'
.
jobs could echo PID:$$
first so we can find sshd/*:cmds in ps faux
, for all jobs on n: etc, and then echo some-marker
to put pagination markets in the output. out%%ch=cmd,s:cmd
learn what normal output looks like for each command.
if anything makes unusual noise it should bring that information into view, ie showing the user novel events.
how do everyone's technical struggles fruit.
Please look to the wiki to pick up or drop off more examples. Think of it as social media for software.
the general-knowledge version of the above.
sequences of commands that diagnose & rectify situations.
should distribute only chunks of cmd_source, which would have a general-knowledge section besides the systems definition.
The example dev environment is for letz
S Group
is here to help! Let me know.