An open collection of methodologies to help with successful training of large language models and multi-modal models.
This is technical material suitable for LLM/VLM training engineers and operators. That is the content here contains lots of scripts and copy-n-paste commands to enable you to quickly address your needs.
This repo is an ongoing brain dump of my experiences training Large Language Models (LLM). e.g., a lot of the know-how I acquired while training the open-source BLOOM-176B model in 2022 and IDEFICS-80B model in 2023. Currently, I'm working on developing/training open-source Retrieval models at Contextual.AI.
I've been compiling this information mostly for myself so that I could quickly find solutions I have already researched in the past and which have worked, but as usual I'm happy to share these with the wider ML community.
If you found a bug, typo or would like to propose an improvement please don't hesitate to open an Issue or contribute a PR.
The content of this site is distributed under Attribution-ShareAlike 4.0 International.