EleutherAI/gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

PythonApache-2.0

Issues

How are multiple datasets loaded?
#1330 opened 9 days ago by fxnie
4
Insufficient support for tokenizer_type
#1328 opened 5 days ago by fxnie
1
Error when converting sequential model to HF
#1323 opened a month ago by SilverSulfide
1
[Questions] Context length extension.
#1327 opened 10 days ago by iPRET
0
AssertionError: Index file doesn't match expected format. Make sure that --dataset-impl is configured properly.
#1329 opened 9 days ago by fxnie
0
How to set the learning rate using WSD? As mentioned in MiniCPM
#1326 opened 10 days ago by fxnie
0
Runtime per step linearly increases with training step number.
#1322 opened a month ago by iPRET
2
Unreachable code and bug in if-else clause
#1324 opened a month ago by tomsbergmanis
1
Can `preprocess_data.py` support Huggingface Dataset?
#1321 opened a month ago by cafeii
1
Error with rotary embeddings and BFloat16
#1305 opened a month ago by jahatef
1
DeeperSpeed cannot support BFloat16 and PipelineParallelism
#1307 opened 2 months ago by jahatef
1
_forward_step_fn does not always return two values so eval.py breaks if is_pipe_parallel is false
#1320 opened a month ago by markNZed
2
Allow training without knowing num_iters
#1268 opened a month ago by StellaAthena
1
KeyError when converting DPO weights from GPTNeoX format to HuggingFace Llama in post-training documentations
#1317 opened a month ago by jacobthebanana
0
LLama mlp project layers missmatch with HF config during conversion
#1319 opened a month ago by Vmjkom
2
Training crashes when "(hidden_size * num_kv_heads) / (num_attention_heads * num_attention_heads)" is not an integer.
#1314 opened a month ago by tiandeyu-cs
0
Cannot perform inference, be it unconditional. input-file or interactive
#1228 opened 7 months ago by srivassid
2
[Question] Running gpt-neox on AMD-based LUMI HPC centre.
#1310 opened 2 months ago by iPRET
1
Latest DeepSpeed not supported
#1306 opened 2 months ago by jahatef
0
batch_input and elapsed time per iteration suddenly slow down during model training
#1248 opened 6 months ago by Yuhanleeee
4
How to set the ffn hidden size parameter in gpt neox
#1230 opened 3 months ago by IronMan-WangJinxi
2
For nucleus sampling, top-p sampling appears to happen on the softmax-normalized top-k logits
#1250 opened 4 months ago by j-frei
3
Cannot convert neox model to HF
#1231 opened 7 months ago by srivassid
2
Assertion Error when Setting pipe_parallel_size or model_parallel_size in GPT-NeoX
#1251 opened 5 months ago by lieh1203
3
How to Load Model from pytorch_model.bin into Trained Model for Text Generation?
#1254 opened 5 months ago by lieh1203
0
what's the biggest dataset you've tried?
#1253 opened 5 months ago by exnx
0
too many .bin files for dataloader, crashed
#1252 opened 5 months ago by exnx
0
The results of running eval show only 1 digit after decimal point for acc on all tested tasks
#1227 opened 6 months ago by lernerjenny
2
My servers used for multi-node training do not have ssh. How can I launch multi-node training using the torchrun command?
#1203 opened 8 months ago by dingning97
2
Add Basic RWKV Block to GPT-NeoX
#1167 opened 6 months ago by Quentin-Anthony
1
'intermediate_size' not set in tools/ckpts/convert_neox_to_hf.py for neox model architecture
#1208 opened 8 months ago by jvendrow
3
LoRA Support
#1204 opened 8 months ago by Quentin-Anthony
1
NCCL error in: ProcessGroupNCCL.cpp:1269, internal error, NCCL version 2.14.3
#1147 opened 8 months ago by mackmake
4
How to convert gpt-neox to llama architecture..?
#1151 opened 8 months ago by yuri-son
1
is there any ignore_index ability in the loss calculation?
#1193 opened 8 months ago by exnx
2
بهترین تعمیرگاه موبایل در مشهد مقدس
#1173 opened 8 months ago by rezaarefi
1
Large model instantiation using `DeepSpeed.zero.Init` under ZeRO-3
#1189 opened 9 months ago by R0n12
1
Is there a way to train on the entire dataset for N epochs without specifying train-iters?
#1164 opened 9 months ago by javirandor
4
continue training from a checkpoint with different number of gpu/node
#1158 opened 9 months ago by mackmake
1
Add basic Mamba block
#1148 opened 9 months ago by Quentin-Anthony
0
MoE loss variable not defined in gpt j residual code path
#1174 opened 10 months ago by tf-nv
0
pipe_parallel_size = 1 using DeepSpeed PipelineEngine
#1172 opened 10 months ago by DayOfThePenguin
1
Converting Pythia checkpoint from HF to NeoX fails
#1161 opened 10 months ago by malteos
3
Dockerfile installation fails to run pythia 14M
#1165 opened 10 months ago by tf-nv
2
PyTorch Lightning Fused optimizer step
#1160 opened 10 months ago by jahatef
0
files in multi-node training
#1146 opened 10 months ago by mackmake
2
Update to current versions of python and pytorch
#1143 opened 10 months ago by segyges
0
Add PyTorch Memory Profiler
#1152 opened 10 months ago by Quentin-Anthony
0
Error on inference of huggingface
#1142 opened a year ago by mackmake
2
calculate epoch
#1140 opened a year ago by mackmake
4