Some more improvements

Question

Some more improvements

vfdev-5 opened this issue 4 years ago · 3 comments

App explanation

Let's either create a tutorial guide show how to use the app or a simply button with a message explaining how to use the app, where to start etc.

Distributed:

Done

Let's simplify this code is no distributed option is selected :

    with idist.Parallel(
        backend=config.backend,
        nproc_per_node=config.nproc_per_node,
        nnodes=config.nnodes,
        node_rank=config.node_rank,
        master_addr=config.master_addr,
        master_port=config.master_port,
    ) as parallel:
        parallel.run(run, config=config)

to

# (no dist)
    with idist.Parallel(
        backend=config.backend,
    ) as parallel:
        parallel.run(run, config=config)

and

# single node
    with idist.Parallel(
        backend=config.backend,
        nproc_per_node=config.nproc_per_node,
    ) as parallel:
        parallel.run(run, config=config)

Readme

Done

We should be very careful with distributed button and this suggestion

python -m torch.distributed.launch \
  --nproc_per_node=2 \
  --use_env main.py \
  --backend="nccl"

as dist button will add the code to spawn processes inside the main process and dist launch will spawn more processes.

Let's do the following:

add another checkbox with the option: use dist launch or spawn process
- if user picks "dist launch" -> README.md says to use: python -m torch.distributed.launch --nproc_per_node=2 ... and in the code we define config.nproc_per_node=None. Same for multi-node: config.master_addr=None etc and python -m torch.distributed.launch --nproc_per_node=2 --master_addr=master --master_port=1234 --nnodes=2 --node_rank=0 ...
- if user picks "spawn" -> README.md says : python main.py ... and in the code we define config.nproc_per_node=2.

We can also imagine folks doing other things like here: https://github.com/sdesrozis/why-ignite

DataLoader

Done

If user picks "spawn" option, we have to update the code like

    train_dataloader = idist.auto_dataloader(
        train_dataset,
        batch_size=config.train_batch_size,
        num_workers=config.num_workers,
        shuffle=True,
        persistent_workers=True
    )
    eval_dataloader = idist.auto_dataloader(
        eval_dataset,
        batch_size=config.eval_batch_size,
        num_workers=config.num_workers,
        shuffle=False,
        persistent_workers=True
    )

"Save the best model by eval score" and "Early stop ..."

Done

It would be better to avoid such messages:

Please make sure to pass argument to metric_name parameter of get_handlers in main.py. Otherwise it can result KeyError.

Let's control what we are doing and configure everything such that we do not need to warn the user like that.

(Later) AMP mode as option ?

Done

It would be nice to add AMP option for image classification / at least.

(Later) Optimizer type

Done

User would like to choose optimizer type: Adam, RMSprop etc

Answer 1 · 2021-04-14T13:12:48.000Z

AMP mode as option ?

AMP is already there. It can be enabled via config.use_amp

Answer 2 · 2021-04-19T16:27:05.000Z

@ydcjeff could you please put [x] Done to where we are OK for this issue ?

Answer 3 · 2021-04-19T16:31:32.000Z

@ydcjeff could you please put [x] Done to where we are OK for this issue ?

Only optimizer option left