EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
PythonApache-2.0
Issues
- 3
'intermediate_size' not set in tools/ckpts/convert_neox_to_hf.py for neox model architecture
#1208 opened by jvendrow - 1
My servers used for multi-node training do not have ssh. How can I launch multi-node training using the torchrun command?
#1203 opened by dingning97 - 1
LoRA Support
#1204 opened by Quentin-Anthony - 5
FileNotFoundError thrown when training
#1127 opened by obicons - 4
NCCL error in: ProcessGroupNCCL.cpp:1269, internal error, NCCL version 2.14.3
#1147 opened by mackmake - 1
How to convert gpt-neox to llama architecture..?
#1151 opened by yuri-son - 2
is there any ignore_index ability in the loss calculation?
#1193 opened by exnx - 1
بهترین تعمیرگاه موبایل در مشهد مقدس
#1173 opened by rezaarefi - 1
- 4
Is there a way to train on the entire dataset for N epochs without specifying train-iters?
#1164 opened by javirandor - 1
- 6
Integrate TransformerEngine
#1098 opened by Quentin-Anthony - 0
Add basic Mamba block
#1148 opened by Quentin-Anthony - 0
MoE loss variable not defined in gpt j residual code path
#1174 opened by tf-nv - 1
- 3
Converting Pythia checkpoint from HF to NeoX fails
#1161 opened by malteos - 2
Dockerfile installation fails to run pythia 14M
#1165 opened by tf-nv - 0
Add Basic RWKV Block to GPT-NeoX
#1167 opened by Quentin-Anthony - 5
- 1
ImportError: /media/h/nvme/gpt-neox/.venv/lib/python3.8/site-packages/flash_attn_2_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol:
#1079 opened by Drzhivago264 - 0
PyTorch Lightning Fused optimizer step
#1160 opened by jahatef - 2
files in multi-node training
#1146 opened by mackmake - 8
Support Mistral Models
#1050 opened by Quentin-Anthony - 2
Argument List too long error
#1076 opened by kavlekar101 - 0
Update to current versions of python and pytorch
#1143 opened by segyges - 0
Port NVIDIA Nsight profiling to gpt-neox
#1134 opened by Quentin-Anthony - 0
Add PyTorch Memory Profiler
#1152 opened by Quentin-Anthony - 1
Tests fail when run with pytest --forked
#1132 opened by segyges - 9
Support for custom model architecture
#1117 opened by itsnamgyu - 0
Convert HF format or raw weights of Llama2 to NEOX format
#1112 opened by fmh1art - 5
Add Instructions for Loading Llama2 Models
#1051 opened by Quentin-Anthony - 2
some Datasets are not available
#1071 opened by vangogh0318 - 2
Error on inference of huggingface
#1142 opened by mackmake - 4
calculate epoch
#1140 opened by mackmake - 3
- 0
Add a Contributor Guide
#1110 opened by Quentin-Anthony - 1
Create Singularity Container
#1119 opened by Quentin-Anthony - 2
convert_hf_to_module(pipeline_parallel>1)
#1092 opened by liuxinxin123 - 4
Support for lm_eval 0.4.0
#1114 opened by ZhiYuanZeng - 10
Apply new fused rotary embedding
#1077 opened by Quentin-Anthony - 5
Help with: No such file or directory: '/fsx/hailey/math-lm/gpt-neox/megatron/fused_kernels'
#1083 opened by andrewarrow - 0
Error in FLOPS Calculation
#1093 opened by passaglia - 1
instruction finetune
#1091 opened by liuxinxin123 - 1
Finetune
#1088 opened by liuxinxin123 - 1
Port DeepSpeed Ulysses
#1078 opened by Quentin-Anthony - 2
Interoperability and GPT-NeoX
#1058 opened by StellaAthena - 0
- 1
Support for Mosaic Models
#1057 opened by rajveer43 - 0
- 0
AssertionError: Not sure how to proceed, we were given deepspeed configs in the deepspeed arguments and deepspeed.initialize() function call
#1043 opened by shaunstoltz