princeton-nlp/DinkyTrain

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

PythonMIT

Issues

Perplexity numbers for masking rates reported in Table 3.
#10 opened 7 months ago by raotnameh
1
While install dependencies ERROR: Could not find a version that satisfies the requirement hydra-core<1.1,>=1.0.7 (from fairseq) (from versions: none) ERROR: No matching distribution found for hydra-core<1.1,>=1.0.7
#9 opened 2 years ago by henrywang0314
1
Could you please tell how to set the hyparameters of the GLUE?
#8 opened 2 years ago by leoozy
2
Roberta recipe in your paper is different from the original recipe
#4 opened 2 years ago by BaohaoLiao
3
fairseq-train: error: argument --arch/-a: invalid choice: 'deepspeed_roberta_large'
#7 opened 2 years ago by leoozy
2
Converting fairseq models to huggingface fails for models trained using DeepSpeed
#5 opened 3 years ago by carlosejimenez
0
Why do you use last checkpoint for validation rather than best checkpoint?
#3 opened 3 years ago by BaohaoLiao
4
Prediction values of the STS-B test set are not in 0~5
#2 opened 3 years ago by BaohaoLiao
4
Why do you use both layer_norm for embedding and pre-norm at the same time?
#1 opened 3 years ago by BaohaoLiao
2