
finetune.py segmentation fault

Closed this issue · 6 comments

I am trying to run the finetune.py and getting a seg. fault. Can anyone help. I am on Apple M2 mac mini with 24G memory.

% python finetune.py 
loc("mps_transpose"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":206:0)): error: 'anec.transpose' op Invalid configuration for the following reasons: Tensor dimensions N1D1C4096H1W32000 are not within supported range, N[1-65536]D[1-16384]C[1-65536]H[1-16384]W[1-16384].
loc("mps_matmul"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":39:0)): error: 'anec.matmul' op Invalid configuration for the following reasons: Tensor dimensions N1D1C4096H1W32000 are not within supported range, N[1-65536]D[1-16384]C[1-65536]H[1-16384]W[1-16384].
zsh: segmentation fault  python finetune.py
  1. Which version of pytorch do you use? I saw some issues with pytorch 2.0.1 (pytorch/pytorch#110975)
  2. did you update the slowllama repo recently?

Thank you!

Thanks. In that case I should upgrade?

  1. Yes.
% pip freeze | egrep -i 'torch|numpy|sentence|fewlines'
  1. Just couple of hours ago.

Yes, try 2.1.0 please. Also please make sure you ran prepare_model and finetune at the same version of slowllama - as it is pretty early/experimental there's no backwards compatibility.

Thanks, I did and it works now! This is super. Great work! Some points I noted:

  1. Not all snapshots were written to disk. Some were skipped. I will re run and check.
  2. GPU usage during finetune was optimum. However during inferencing, it is using fairly low, <10%. Is this expected?
    Will try more things in coming days. Thanks again.
  1. The logic to save is to only save if loss is lower. We can change that (https://github.com/okuvshynov/slowllama/blob/main/finetune.py#L53-L58).
  2. Yes, I put close to no effort to inference optimization. I think there are other libraries focusing on that specifically (e.g. llama.cpp)

Thanks for the clarifications, I will dive into the code too. Though I am not an expert at this but I will give it my best shot. llama.cpp works great for me, however I am unable to get the finetune to work on it on my mac.