NVIDIA/NeMo-Framework-Launcher
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
PythonApache-2.0
Issues
- 0
slurm Multi-machine and multi-GPU training
#416 opened by yangzhipeng1108 - 0
Adding support for Ray as a launcher backend
#387 opened by Irvingwangjr - 0
’General Configuration‘ Page Not Found
#373 opened by Pty72 - 1
- 1
The hydra-core and omegaconf file encountered type error unexpected keyword vesion base
#338 opened by imn00b - 1
[Request][Nemo-Curator] missing add_id and download_and_extract features on Launcher
#339 opened by leejinho610 - 0
NeMo FW Launcher doesn't have `add_id` and `download_and_extract` logic unlike NeMo Curator repo
#334 opened by eagle705 - 0
Data Preparation Failed in Kubernetes
#314 opened by Syulin7 - 0
fsdp not supported for Llama model
#276 opened by jeffnvidia - 0
llama2 7b SFT met error
#271 opened by inspurasc - 1
- 2
- 1
k8s llama2 Data_preparation error inquiry
#236 opened by donggyulee1 - 0
k8s llama2 Data preparation error inquiry
#235 opened by donggyulee1 - 0
- 2
- 1
OSError: sbatch: error: Batch job submission failed: Invalid generic resource (gres) specification
#72 opened by starlitsky2010 - 0
GPT3 126m divergence
#207 opened by ethanhe42 - 7
Docker Build Fails
#184 opened by TaekyungHeo - 1
- 0
Documentation Code Block Display Issue
#178 opened by KTH1234 - 0
- 0
- 1
Is nemo-rlhf code available?
#53 opened by maxhgerlach - 1
- 0
is CSP (Azure,Aws or etc..) must to work on NeMo framework for Generative AI work? Cant we scale/leverage it on -OnPremise database with 8 GPUs
#112 opened by Shantan243 - 0
- 1
- 0
- 0
- 2
Which versions of Pyxis, Slurm and enroot for running NeMo-Megatron-Launcher on one Node with 8 * A100?
#73 opened by starlitsky2010 - 2
Where to execute the python main.py when use slurm only on only one GPU Node ?
#69 opened by starlitsky2010 - 1
pyxis: failed to import docker image
#28 opened by szhengac - 3
Access to the script for evaluating the inference performance (average latency vs. model size)
#22 opened by songkq - 2
Convert nemo-megatron-mt5-3B to a binary file of fastertransformer successfully, but tritonserver fails with undesired shape when loading models.
#21 opened by songkq - 3
Is there a contribution guide for this repo?
#7 opened by kaiyux - 4
Device Support
#6 opened by MikeDean2367