EleutherAI/pythia

The hub for EleutherAI's work on interpretability and learning dynamics

Jupyter NotebookApache-2.0

Issues

Shard hashes for `EleutherAI/pile-deduped-pythia-preshuffled`
#183 opened 24 days ago by pietrolesci
0
how to use Pythia
#160 opened a year ago by gaohang
1
Deduplicated Pile dataset with Domain Attribution
#137 opened a year ago by michaelduan8
1
Batch Viewer : Why Sequence Length 2049?
#123 opened a year ago by prakharg24
15
Pythia 160M is giving unreasonable logit values
#177 opened 6 months ago by danielmisrael
1
Inconsistent init methods of pythia-6.9b model
#135 opened a year ago by mqyqlx
3
Can't find the index file
#180 opened 5 months ago by jaydeepborkar
1
`torch.concat` is supported when reproducing results with docker
#178 opened 5 months ago by pingzhili
0
No EOD Tokens in EleutherAI/pile-deduped-pythia-preshuffled
#175 opened 6 months ago by markschoene
1
The possibility of modifying the checkpoint and reloading the model parameter
#173 opened 6 months ago by peteryang1031
0
Questions regarding the WSC evaluation results
#172 opened 7 months ago by mutiann
0
Clarification of Pythia Deduped Precision - bf16 or fp16?
#171 opened 7 months ago by RylanSchaeffer
1
Inquiry about Re-uploading Additional Pythia-410M Model Variants(i.e., seed1-9)
#169 opened 8 months ago by liudan193
0
Issue while showering NLO events with NLO
#165 opened 10 months ago by rash-eng
4
open-source the training data used between two adjacent checkpoints
#167 opened 9 months ago by txy77
0
How to use the Huggingface dataset /EleutherAI/pythia-memorized-evals in predictable-memorization?
#164 opened 10 months ago by Happy2Git
0
cache_dir cannot be the same as model name
#163 opened 10 months ago by arunasank
0
Pythia 12b flash config
#162 opened 10 months ago by jvendrow
0
Reshape error in batch viewer
#158 opened a year ago by activatedgeek
1
Convert to GGUF
#159 opened a year ago by yanxon
0
tokenizer.pad_token
#156 opened a year ago by vincent317
1
instruct-tuned pythia
#155 opened a year ago by WilliamsToTo
0
Provide the shuffled index_mapping npy files for ease of reproducing training data
#153 opened a year ago by ziqi-zhang
0
Optimizer states in HF format
#152 opened a year ago by seyuboglu
1
Weird inconsistency in Tokenizer vocabulary
#151 opened a year ago by javirandor
1
Is there existing code to resume training from specific checkpoint?
#150 opened a year ago by javirandor
1
"gas" configuration doesn't do anything
#149 opened a year ago by segyges
0
Missing / undownloadable checkpoints on huggingface
#141 opened a year ago by mirandrom
3
Reading data is slowly！
#126 opened a year ago by Lisennlp
1
Would it be possible to share training loss curves on the original Pythia models?
#145 opened a year ago by itsnamgyu
4
Mismatch about the evaluation results
#118 opened a year ago by yuzc19
11
[Pythia on Pile-Dedup] Training for ~1.5 epochs: how to identify the repeated sequences (i.e., the additional .5 epoch)?
#144 opened a year ago by pietrolesci
3
Has the data been shuffled?
#127 opened a year ago by Lisennlp
2
Details about "EleutherAI/pythia-160m-seed*" models
#142 opened a year ago by IanMagnusson
3
Pytia or GPT-neox?
#138 opened a year ago by borgr
1
.
#140 opened a year ago by ParthaKrPaul
2
Wrong files in eval?
#139 opened a year ago by borgr
0
Replicating the Training Data Order
#136 opened a year ago by prakharg24
1
The value of weight decay
#132 opened a year ago by yehuitang
1
Error when running unshard_memmap.py
#114 opened a year ago by ShaneeyS
2
Model Initialization Question
#129 opened a year ago by yanlai00
1
The performance about pythia and LLaMA model architecture
#122 opened a year ago by peiyingxin
1
Any results on the validation set?
#121 opened a year ago by chujiezheng
1
Weights tying
#117 opened a year ago by link-er
1
Convert the huggingface checkpoint to GPT-Neox checkpoint
#116 opened 2 years ago by ZhiYuanZeng
2
Clarification of Pythia tokenizer(s) at different sizes, steps and data preprocessing?
#115 opened 2 years ago by RylanSchaeffer
1
Difference between LFS and HuggingFace datasets?
#112 opened 2 years ago by eric-mitchell
1
Can I provide custom data and continue training Pythia on this new data?
#113 opened 2 years ago by GeorgiAngelov
1
Multiple training runs of same model with different random seed for weight initialisation
#110 opened 2 years ago by KarolisRam
1
Possible error in Pythia-12B-deduped step 32000
#108 opened 2 years ago by smahdavi4
2