Is the model architecture of DCLM different from LLaMA?

Question

Is the model architecture of DCLM different from LLaMA?

Closed this issue 2 months ago · 2 comments

Hello, thank you for your excellent work.

I would like to ask if the model architecture of DCLM is different from LLaMA? I hope to use the LLaMA architecture (or any other officially supported LLMs in transformers) to load the weights, thereby conveniently utilizing the existing ecosystem.

Answer 1 · 2024-08-02T10:42:39.000Z

Hi @czczup
Thank you for the compliments and the interest in our work.
The architecture is very similar to that of llama but is not identical. You can find details in Appendix F. in the paper.
However, our model was released in a transformers-supported format here so you can use it easily in your existing ecosystem.

Answer 2 · 2024-08-02T14:34:12.000Z

Thanks for your quick response. I have read a part of the hf-format code, and it works.

I would like to suggest that it would be better if the model code could be refactored using a transformers-style. Providing a separate code file in the Hugging Face repository that can be loaded via trust_remote_code would greatly enhance the usability.