maxbbraun/llama4micro

Try larger models 💪

Opened this issue · 0 comments

The current implementation works with the 15M parameter version of tinyllamas. Just dropping in the next larger one (42M) flashes fine, but freezes at runtime.

Would need to look into what's happening here. It could be that the model weights plus the run state are larger than the available RAM (63.5MB). I might also have overlooked something about the memory layout. If it's the former, there might be a way to optimize memory usage to fit everything.

Another option would be to train a model between 15M and 42M parameters that just barely fits without any further optimizations.