kingoflolz/mesh-transformer-jax

Model parallel transformers in JAX and Haiku

PythonApache-2.0

Issues

Web demo is not launching any results. Might be disconnected from the model.
#241 opened 2 years ago by Gertie01
43
lm_eval missing
#264 opened 3 months ago by falv706
0
GPT-J used in "Domain-Specific Text Generation for Machine Translation"
#250 opened 8 months ago by ymoslem
0
AttributeError: module 'jax.random' has no attribute 'KeyArray' while fine tuning.
#221 opened 3 years ago by samyakai
15
6b.eleuther.ai mystic model is down for GPT-J-6B.
#263 opened a year ago by Gertie01
0
Web demo must be fixed.
#261 opened a year ago by Gertie01
0
TPU-V4
#255 opened 2 years ago by wimjan123
11
6b.eleuther.ai mystic model is down for GPT-J-6B
#226 opened 3 years ago by orionnelson
4
What about a Hugging Face Spaces demo so we can test this?
#262 opened a year ago by Gertie01
0
About rope embedding
#260 opened a year ago by eyuansu62
0
Framework
#259 opened a year ago by T3fo0ls7766
0
Finetuning Hardware Recomendations
#258 opened a year ago by greyweb
0
How to stop model generating
#228 opened 3 years ago by jingrongchen
1
Download Link for Model Weights in howto_finetune.md is broken
#251 opened 2 years ago by torakoneko
2
The PILE dataset is full of racist content and thus GPT-J produces racist thinking.
#240 opened 2 years ago by azeemh
2
How to infer with GPT-J on TPU_driver0.2 or nightly?
#256 opened 2 years ago by mosmos6
1
tpu_driver0.1 is not initialized on colab (cannot infer with GPT-J on Colab) [Again]
#252 opened 2 years ago by mosmos6
7
Discrepancy between results reported in this repo and in the NeoX paper
#257 opened 2 years ago by william-cerebras
2
Resolving dependency issues
#246 opened 2 years ago by rinapch
6
Can we please get a quickstart guide?
#243 opened 2 years ago by tswallen
2
Which version of Python does this work with?
#253 opened 2 years ago by chrisbward
2
Quantization for training / finetuning
#254 opened 2 years ago by torphix
0
TypeError: __init__() takes 2 positional arguments but 4 were given
#225 opened 3 years ago
1
Could not find a version that satisfies the requirement ray[default]==1.4.1
#245 opened 2 years ago by Maxim-Mazurok
5
Fine-tuning on conversations (format of conversations)
#248 opened 2 years ago by Eichhof
2
Do you have any plans to create the open source version of chatGPT ?
#244 opened 2 years ago by stc2001
2
training stuck at validation step 1
#218 opened 3 years ago by Selimonder
4
TPU not found on VM (jax version 0.2.16)
#242 opened 2 years ago by Eichhof
0
Project dependencies may have API risk issues
#239 opened 2 years ago by PyDeps
0
AttributeError: module 'jaxlib.pocketfft' has no attribute 'pocketfft'
#233 opened 2 years ago by umm-maybe
4
Dead link to weights?
#238 opened 2 years ago by samacqua
1
TPU Instance Creation
#237 opened 2 years ago by zzj0402
2
Update the readme with required and recommended hardware list
#236 opened 2 years ago by sxiii
3
on number of training tokens of gpt-j-6b and gpt-neox-20b
#235 opened 2 years ago by xiaoda99
0
Is the treatment of embedding bias in to_hf_weights.py correct?
#234 opened 2 years ago by xiaoda99
2
TypeError: Cannot subclass <class 'typing._SpecialForm'> while fine tuning
#222 opened 3 years ago by samyakai
9
GPT-J-6B Inference Demo notebook giving errors when cores_per_replica=1
#232 opened 2 years ago by batrasakshi
1
Google Colab Error: optax is throwing an attribute error.
#230 opened 2 years ago by prajjwalgeek
2
Typo in 'to_hf_weights.py '
#231 opened 2 years ago by AmoArt
1
[Feature Request] Multilingual assistance.
#229 opened 3 years ago by phly95
0
Training data format for generating Scenario based MCQ's
#224 opened 3 years ago by shrey10926
2
Finetuning GPT Neo 20B Using TPU V3-8s
#227 opened 3 years ago by nikhilanayak
0
`TypeError: Cannot subclass <class 'typing._SpecialForm'>` in `slim_model.py `
#212 opened 3 years ago by danyaljj
3
GPT-J inference on TPU
#219 opened 3 years ago by airesearch38
3
CausalTransformerV2 or CausalTransformer?
#220 opened 3 years ago by leejason
0
Can "slim_model.py" work with "d_model" as 768?
#217 opened 3 years ago by leejason
0
Running on Colab TPU only gives random words and nonsensical outputs
#216 opened 3 years ago by JohnnyRacer
2
Error while `to_hf_weights.py`: `ValueError: cannot reshape array of size 25804800 into shape (1,4096,50400)`
#214 opened 3 years ago by danyaljj
1
`OSError: libmkl_intel_lp64.so.1: cannot open shared object file` when using `to_hf_weights.py`
#215 opened 3 years ago by danyaljj
1
`Incompatible checkpoints` error when running `slim_model.py`
#213 opened 3 years ago by danyaljj
1