/pretraining-data-packing

[ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training

Primary LanguagePythonMIT LicenseMIT

Watchers