EleutherAI/pythia
The hub for EleutherAI's work on interpretability and learning dynamics
Jupyter NotebookApache-2.0
Issues
- 0
- 1
how to use Pythia
#160 opened by gaohang - 1
- 15
Batch Viewer : Why Sequence Length 2049?
#123 opened by prakharg24 - 1
Pythia 160M is giving unreasonable logit values
#177 opened by danielmisrael - 3
Inconsistent init methods of pythia-6.9b model
#135 opened by mqyqlx - 1
Can't find the index file
#180 opened by jaydeepborkar - 0
- 1
- 0
The possibility of modifying the checkpoint and reloading the model parameter
#173 opened by peteryang1031 - 0
Questions regarding the WSC evaluation results
#172 opened by mutiann - 1
- 0
Inquiry about Re-uploading Additional Pythia-410M Model Variants(i.e., seed1-9)
#169 opened by liudan193 - 4
Issue while showering NLO events with NLO
#165 opened by rash-eng - 0
- 0
How to use the Huggingface dataset /EleutherAI/pythia-memorized-evals in predictable-memorization?
#164 opened by Happy2Git - 0
cache_dir cannot be the same as model name
#163 opened by arunasank - 0
Pythia 12b flash config
#162 opened by jvendrow - 1
Reshape error in batch viewer
#158 opened by activatedgeek - 0
Convert to GGUF
#159 opened by yanxon - 1
tokenizer.pad_token
#156 opened by vincent317 - 0
instruct-tuned pythia
#155 opened by WilliamsToTo - 0
Provide the shuffled index_mapping npy files for ease of reproducing training data
#153 opened by ziqi-zhang - 1
Optimizer states in HF format
#152 opened by seyuboglu - 1
Weird inconsistency in Tokenizer vocabulary
#151 opened by javirandor - 1
- 0
"gas" configuration doesn't do anything
#149 opened by segyges - 3
Missing / undownloadable checkpoints on huggingface
#141 opened by mirandrom - 1
Reading data is slowly!
#126 opened by Lisennlp - 4
Would it be possible to share training loss curves on the original Pythia models?
#145 opened by itsnamgyu - 11
Mismatch about the evaluation results
#118 opened by yuzc19 - 3
[Pythia on Pile-Dedup] Training for ~1.5 epochs: how to identify the repeated sequences (i.e., the additional .5 epoch)?
#144 opened by pietrolesci - 2
Has the data been shuffled?
#127 opened by Lisennlp - 3
- 1
Pytia or GPT-neox?
#138 opened by borgr - 2
.
#140 opened by ParthaKrPaul - 0
Wrong files in eval?
#139 opened by borgr - 1
Replicating the Training Data Order
#136 opened by prakharg24 - 1
The value of weight decay
#132 opened by yehuitang - 2
Error when running unshard_memmap.py
#114 opened by ShaneeyS - 1
Model Initialization Question
#129 opened by yanlai00 - 1
- 1
Any results on the validation set?
#121 opened by chujiezheng - 1
Weights tying
#117 opened by link-er - 2
- 1
Clarification of Pythia tokenizer(s) at different sizes, steps and data preprocessing?
#115 opened by RylanSchaeffer - 1
- 1
Can I provide custom data and continue training Pythia on this new data?
#113 opened by GeorgiAngelov - 1
Multiple training runs of same model with different random seed for weight initialisation
#110 opened by KarolisRam - 2
Possible error in Pythia-12B-deduped step 32000
#108 opened by smahdavi4