NVIDIA/NeMo

NeMo2.0 nemorun llm export ValueError: PyTorch DDP is not enabled for mcore optimizer

Closed this issue · 2 comments

Describe the bug

A clear and concise description of what the bug is.

Steps/Code to reproduce bug

nemorun llm import llama3_8b hf://meta-llama/Meta-Llama-3-8B -y
nemorun llm export ~/.cache/nemo/models/meta-llama/Meta-Llama-3-8B/context hf exp/PreTrain/export_llama3_8b -y

error

Dry run for task nemo.collections.llm.api:export_ckpt
Resolved Arguments
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Argument Name        ┃ Resolved Value                                               ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ load_connector       │ <function load_connector_from_trainer_ckpt at                │
│                      │ 0x7feaf6adfbe0>                                              │
│ output_path          │ PosixPath('exp/PreTrain/export_llama3_8b')                   │
│ overwrite            │ False                                                        │
│ path                 │ PosixPath('/root/.cache/nemo/models/meta-llama/Meta-Llama-3… │
│ target               │ 'hf'                                                         │
└──────────────────────┴──────────────────────────────────────────────────────────────┘
Launching None...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
[NeMo W 2024-10-18 09:16:11 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/setup.py:177: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.
    
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
[ERROR    | root               ]: An error occurred: PyTorch DDP is not enabled for mcore optimizer
Traceback (most recent call last):
  File "/usr/local/bin/nemorun", line 8, in <module>
    sys.exit(app())
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 326, in __call__
    raise e
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 309, in __call__
    return get_command(self)(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 723, in main
    return _main(
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 193, in _main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 692, in wrapper
    return callback(**use_params)
  File "/home/lifeiteng/code/NeMo-Run/src/nemo_run/cli/api.py", line 793, in command
    self.cli_execute(fn, ctx.args, type)
  File "/home/lifeiteng/code/NeMo-Run/src/nemo_run/cli/api.py", line 845, in cli_execute
    self._execute_task(fn, filtered_args)
  File "/home/lifeiteng/code/NeMo-Run/src/nemo_run/cli/api.py", line 895, in _execute_task
    run_task()
  File "/home/lifeiteng/code/NeMo-Run/src/nemo_run/cli/api.py", line 874, in run_task
    run.run(
  File "/home/lifeiteng/code/NeMo-Run/src/nemo_run/run/api.py", line 65, in run
    direct_run_fn(fn_or_script, dryrun=dryrun)
  File "/home/lifeiteng/code/NeMo-Run/src/nemo_run/run/task.py", line 77, in direct_run_fn
    built_fn()
  File "/home/lifeiteng/code/NeMo/nemo/collections/llm/api.py", line 432, in export_ckpt
    return io.export_ckpt(path, target, output_path, overwrite, load_connector)
  File "/home/lifeiteng/code/NeMo/nemo/lightning/io/api.py", line 197, in export_ckpt
    return exporter(overwrite=overwrite, output_path=_output_path)
  File "/home/lifeiteng/code/NeMo/nemo/lightning/io/connector.py", line 85, in __call__
    to_return = self.apply(_output_path)
  File "/home/lifeiteng/code/NeMo/nemo/collections/llm/gpt/model/llama.py", line 301, in apply
    source, _ = self.nemo_load(str(self))
  File "/home/lifeiteng/code/NeMo/nemo/lightning/io/connector.py", line 216, in nemo_load
    _trainer.strategy.connect(model)
  File "/home/lifeiteng/code/NeMo/nemo/lightning/pytorch/strategies/megatron_strategy.py", line 286, in connect
    raise ValueError("PyTorch DDP is not enabled for mcore optimizer")
ValueError: PyTorch DDP is not enabled for mcore optimizer

Expected behavior

The export of the Meta-Llama-3-8B model should complete successfully without errors, resulting in a checkpoint file stored in the specified path.

Thanks for reporting this bug, will look into it asap & push a fix.

Hi this has been fixed in #11081 please give it a try, will be merged soon. Thanks.