Issues
- 3
Wrong definition of Query, Key, Value matrices? They shouldn't have bias=True
#93 opened by LeoPerelli - 1
What is the purpose of `c_proj` here?
#135 opened by brynhayder - 1
Output of CausalSelfAttention
#118 opened by whchan05 - 1
- 7
- 0
We collect more than 60 open source software for CAX and want to train miniGP to be the AI assistant.
#136 opened by fengsim - 0
GPT-2 implementation problem
#134 opened by sanhai77 - 0
- 0
- 0
- 0
concatenate two BPE tokenizer
#129 opened by mackmake - 0
which pytorch version should be used pls for windows OS only CPU use only for inference ?
#127 opened by Sandy4321 - 4
- 0
what is the minimum hardware requirement to train
#126 opened by jorjiang - 1
What's the max output tokens this model supports?
#125 opened by aletote - 1
- 1
Question: does it support other utf-8 natual language?
#115 opened by yingshaoxo - 3
Simplifying weigh decay checking doesn't work
#112 opened by rabinadk1 - 1
How can I run a trained model and can't run Test_ Hugging face_ Import. py
#119 opened by linlong1314 - 1
About layer norm dimention parameter:
#113 opened by vcvycy - 0
- 1
- 0
Crashed Encoder possible data corruption
#111 opened by DayneSorvisto - 0
Information leak in training procedure?
#107 opened by ljch2018 - 2
- 0
how does this compare to aitextgen?
#105 opened by breadbrowser - 0
Stop words?
#104 opened by BoyuanJackChen - 1
tests do not run in project as built
#99 opened by ben-schulz - 0
Facilitating setup with popular tools
#98 opened by Utopiah - 1
Caching for generation
#95 opened by murbard - 0
Renaming transformer.h into transformer.l
#94 opened by marxav - 2
- 0
Is it more reasonable to only use causal attention in the first block of GPT
#79 opened by charlesxu90 - 0
- 1
Use of amp.autocast does not improve performance
#51 opened by aurotripathy - 1
- 1
model self-attention hardcoded to 4 heads
#71 opened by SpeedCoder5 - 0
Integration with HuggingFace
#68 opened by marxav - 0
- 2
Sharing Pretrained Checkpoints
#42 opened by barisbatuhan - 0
- 0
- 1
How do I see test loss?
#56 opened by aletote - 0
TPU/GPU training: KeyError 'pos_emb'
#57 opened by tech509201941 - 2
Will this repo add the reformer or teach people how to implement reformer in Pytorch?
#47 opened by JonathanSum - 0
Question about memory usage for play_math
#54 opened by pablogranolabar - 2
Add a "How to cite this work" section in README.md
#49 opened by marxav - 1
How to apply to time series?
#48 opened by thinkingparticle - 1
GPT vs BERT, under same computation and data resource, which one is better for downstream tasks like GLUE?
#44 opened by guotong1988 - 3