karpathy/minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

PythonMIT

Issues

Wrong definition of Query, Key, Value matrices? They shouldn't have bias=True
#93 opened 2 years ago by LeoPerelli
3
What is the purpose of `c_proj` here?
#135 opened 2 months ago by brynhayder
1
Output of CausalSelfAttention
#118 opened a year ago by whchan05
1
where did the self.bias get defined in the casual attention class
#133 opened 3 months ago by nebyu08
1
error line 200, in from_pretrained assert len(keys) == len(sd)
#128 opened 5 months ago by Sandy4321
7
We collect more than 60 open source software for CAX and want to train miniGP to be the AI assistant.
#136 opened 2 months ago by fengsim
0
GPT-2 implementation problem
#134 opened 2 months ago by sanhai77
0
Strange model behavior when taking the softmax in the wrong dimension
#132 opened 4 months ago by Cloud299
0
how to build a model and interact with it like chatgpt?
#131 opened 4 months ago by IamExperimenting
0
Support for Multi-GPU Parallel Training in chargpt.py
#130 opened 4 months ago by JinXiaofeng1234
0
concatenate two BPE tokenizer
#129 opened 4 months ago by mackmake
0
which pytorch version should be used pls for windows OS only CPU use only for inference ?
#127 opened 5 months ago by Sandy4321
0
AssertionError when run generate.ipynb with default parameter
#120 opened 10 months ago by jacquesqiao
4
what is the minimum hardware requirement to train
#126 opened 6 months ago by jorjiang
0
What's the max output tokens this model supports?
#125 opened 7 months ago by aletote
1
Should -1 marker (as special token) be counted in vocab_size?
#123 opened 9 months ago by mw66
1
Question: does it support other utf-8 natual language?
#115 opened a year ago by yingshaoxo
1
Simplifying weigh decay checking doesn't work
#112 opened a year ago by rabinadk1
3
How can I run a trained model and can't run Test_ Hugging face_ Import. py
#119 opened a year ago by linlong1314
1
About layer norm dimention parameter:
#113 opened a year ago by vcvycy
1
生成圖片
#114 opened a year ago by rubucat
0
cannot import name 'sample' from 'mingpt.utils'
#90 opened 2 years ago by chris-the-wiz
1
Crashed Encoder possible data corruption
#111 opened a year ago by DayneSorvisto
0
Information leak in training procedure？
#107 opened a year ago by ljch2018
0
Is there a Tensorflow-version of minGPT? Or will someone implement it?
#45 opened 4 years ago by guotong1988
2
how does this compare to aitextgen?
#105 opened a year ago by breadbrowser
0
Stop words?
#104 opened a year ago by BoyuanJackChen
0
tests do not run in project as built
#99 opened a year ago by ben-schulz
1
Facilitating setup with popular tools
#98 opened a year ago by Utopiah
0
Caching for generation
#95 opened a year ago by murbard
1
Renaming transformer.h into transformer.l
#94 opened a year ago by marxav
0
How to handle unequal sequence length in a batch
#70 opened 2 years ago by luxuantao
2
Is it more reasonable to only use causal attention in the first block of GPT
#79 opened 2 years ago by charlesxu90
0
Meaning of "-1 because very last digit doesn't plug back"
#77 opened 2 years ago by vwxyzjn
0
Use of amp.autocast does not improve performance
#51 opened 4 years ago by aurotripathy
1
Perfect training and evaluation loss, but terrible test-time performance
#75 opened 2 years ago by micahcarroll
1
model self-attention hardcoded to 4 heads
#71 opened 2 years ago by SpeedCoder5
1
Integration with HuggingFace
#68 opened 2 years ago by marxav
0
play_math AdditionDataset.__get_item__ return value?
#67 opened 2 years ago by SpeedCoder5
0
Sharing Pretrained Checkpoints
#42 opened 4 years ago by barisbatuhan
2
How to determine `warmup_tokens` and `final_tokens`?
#65 opened 2 years ago by fgolemo
0
Error when I provide test dataset (custom minGPT)
#59 opened 3 years ago by asigalov61
0
How do I see test loss?
#56 opened 3 years ago by aletote
1
TPU/GPU training: KeyError 'pos_emb'
#57 opened 3 years ago by tech509201941
0
Will this repo add the reformer or teach people how to implement reformer in Pytorch?
#47 opened 3 years ago by JonathanSum
2
Question about memory usage for play_math
#54 opened 3 years ago by pablogranolabar
0
Add a "How to cite this work" section in README.md
#49 opened 4 years ago by marxav
2
How to apply to time series?
#48 opened 4 years ago by thinkingparticle
1
GPT vs BERT, under same computation and data resource, which one is better for downstream tasks like GLUE?
#44 opened 4 years ago by guotong1988
1
Potential encoding issue in addition problem in play_math notebook?
#40 opened 4 years ago by ravi-annaswamy
3