feat(distributed): Support distributed training debug
Opened this issue · 1 comments
gaocegege commented
Description
The CUJ looks like:
envd run --image xx --replicas 20
Then there will be one interactive shell, and users can type a command, which will run in all replicas.
Then the STDOUT will be aggregated in the terminal.
Message from the maintainers:
Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.