- An experimental repository for training/inference LLMs on low with low GPU memory usage.
- Need to install Flash Attention 2 for low memory training, whether or not using Unsloth or not. For Flash Attention 2, CUDA Toolkit needs to be installed globally. But no CuDNN is needed globally.