dvmazur/mixtral-offloading

How to split the model parameter safetensors file into multiple small files

YLSnowy opened this issue · 5 comments

The original Mixtral model safetensors files have a total of 19, and the parameter file you provided has 257 safetensors. How do you split the model? Can you provide this part of the code?

As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.

As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.

get it! Thank you

As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.

I store the parameters of a single expert as a .safetensors file, but the file attribute is not data, but a lif file. May I ask how you split it into multiple safetensors files instead of loading the code of the parameter file?

As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.

I store the parameters of a single expert as a .safetensors file, but the file attribute is not data, but a lif file. May I ask how you split it into multiple safetensors files instead of loading the code of the parameter file?

What's a lif file?

As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.

I store the parameters of a single expert as a .safetensors file, but the file attribute is not data, but a lif file. May I ask how you split it into multiple safetensors files instead of loading the code of the parameter file?

What's a lif file?

It represents a file that Linux cannot correctly recognize. I think there should be a problem with my splitting process. I found that this problem only exists in the division of expert parameters and does not exist in other parameters. I am looking through the source code of mixtral. I found that the parameters of decoderlayer were marked as _no_split_module, so I am more curious about how you split it into multiple files.