How do we get tokenizer_model
yangyyt opened this issue · 2 comments
yangyyt commented
How do we get this tokenizer_model and prepare data?
yangyyt commented
Srijith-rkr commented
I uploaded the tokenizer_model here: https://huggingface.co/Srijith-rkr/Whispering-LLaMA/tree/main
I have also added the Alpaca model weights in the repo. Once you download it, you can merge them together for the LLM weights
Something like :
a = torch.load(alpaca_a.pth)
b = torch.load(alpaca_b.pth)
c = torch.load(alpaca_c.pth)
lit_lamma.pth = a | b | c # merging these for the final checkpoint
torch.save(lat_llama.pth, "[Mention path to Dir]")
You can also check out the notebooks at https://github.com/Srijith-rkr/Whispering-LLaMA/tree/main/data_preparation to figure out how to prepare your custom dataset.