microsoft/Samba

Models

fakerybakery opened this issue · 7 comments

Hi, thanks for releasing Samba! Are there any plans to release the pretrained models? Thanks!

Yep! It would be great to see the 3B and 7B or 8B models...

Releasing of the 3.8B insturction-tuned model is on the plan! We may also release smaller base models like Samba-421M and Samba-1.3B trained on SlimPajama (hopefully in a week or two). We currently don't have the plan to train a larger 7B model due to the shortage of GPUs. 🥲

Nice! Please release the base models and smaller models!

AshD commented

Releasing of the 3.8B insturction-tuned model is on the plan! We may also release smaller base models like Samba-421M and Samba-1.3B trained on SlimPajama (hopefully in a week or two). We currently don't have the plan to train a larger 7B model due to the shortage of GPUs. 🥲

This architecture looks great.
How much GPU time would be required to train a 7B and 14B model? In your opinion, would it be able to beat a Llama 3 70B transformer model in the benchmarks?

Samba-421M

I used Microsoft's Deberta-V3 models A LOT in different projects, because they are so small and I can run quick experiments with them at home. So I am really looking forward to a new small model :)

After the internal business review, we are sorry to hear that we can not release the Samba 421M and 1.3B models trained on SlimPajama. This is because the SlimPajama dataset contains the Books3 dataset which has copyright infringement.🥲 We will continue to push the releasing of Samba 1.7B and 3.8B models trained on the Phi datatsets.

Releasing of the 3.8B insturction-tuned model is on the plan! We may also release smaller base models like Samba-421M and Samba-1.3B trained on SlimPajama (hopefully in a week or two). We currently don't have the plan to train a larger 7B model due to the shortage of GPUs. 🥲

This architecture looks great. How much GPU time would be required to train a 7B and 14B model? In your opinion, would it be able to beat a Llama 3 70B transformer model in the benchmarks?

It depends on how many tokens we want to train it on. The Samba 3.8B model takes around the same amount of GPU time as the Phi3 models. I personally think it is definitely possible to beat Llama3 70B on benchmarks with better data mixtures customized for a 14B model.