fork of https://openreview.net/forum?id=mNtmhaDkAr - extending for inductive bias in RL
- 0
plot trl runs
#26 opened by EdoardoPona - 2
- 1
- 1
Rewards without confidence
#22 opened by EdoardoPona - 1
- 0
- 2
Sentiment task: finding best hyperparameters.
#17 opened by diogo-cruz - 1
Implementing and running different LLM tasks
#19 opened by diogo-cruz - 4
- 1
Warm up GPT2 on review data
#21 opened by diogo-cruz - 1
Understanding MDL calculations
#16 opened by diogo-cruz - 0
Improve plots
#20 opened by diogo-cruz - 4
Implement sentiment reward
#12 opened by diogo-cruz - 0
batched rewards
#18 opened by EdoardoPona - 5
Implement sentiment dataset
#11 opened by diogo-cruz - 1
Fix issue with generating multi-token
#9 opened by diogo-cruz - 0
Clean up toy+RL setup
#14 opened by diogo-cruz - 3
Results collection for RL models
#6 opened by EdoardoPona - 1
- 1
Implement Alex's reward and test it
#8 opened by diogo-cruz - 2
Load custom transformer in RL4LM
#4 opened by EdoardoPona - 1