Libensemble/libensemble

New module for reverse-ssh connect-to-scheduler

Opened this issue · 0 comments

Workers can be easily launched onto some remote system and easily communicate back to the manager via reverse-ssh. But this currently won't easily place those workers on a relevant compute node.

What we'd prefer:

  • On some head/generator/AI node, specify remote-connect to cluster, with nworkers
  • We build a command that configures reverse-ssh and launches a scheduler module remotely
  • Launch this command. Remotely a module creates a scheduler job via PSI/J , parsl, Balsam, etc.
  • Submit the job, launch all workers with same reverse-ssh parameters

This may resemble:

python my_script.py --ssh polaris --queue prod --nworkers 32