InvalidCxxCompiler exception at 33% of training
Closed this issue · 4 comments
I pulled the finetune notebook into a PyCharm project. Created a new venv based off python 3.10. Installed gliner 0.2.7 and then pip installed accelerate -U (0.32.1) like the notebook said. The model is still the small one.
At the 33% training mark, the training crashes with the following exception:
` File "C:\Users\XXXXXX\venv_2024\GlinerFineTuning\lib\site-packages\torch_inductor\codecache.py", line 971, in cpp_compiler_search
raise exc.InvalidCxxCompiler()
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')`
Is an external compiler required for training? If so, which one?
Preceding the crash, as many torch related warnings like:
`Skipping iteration due to error: backend='inductor' raised:
AssertionError: ((393216,), (512, 768))
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True`
I really have enjoyed GLiNER as it ships, but I need some fine-tuning. Any thoughts?
Can you make sure that you have g++ installed? Also, did you run a cell that compiles a model? Because I found this related issue pytorch/pytorch#92745. Plus can you clarify how large your dataset is?
@Ingvarstep Thank you for replying.
I do not have g++ installed. Is g++ required for fine-tuning?
In walking the cpp_compiler_search() method that triggers the exception, there is a check for the platform being "linux" - does this mean I need to be on a linux platform for fine-tuning?
I did execute the model compilation.
I have a very similar stack trace as shown at the pytorch site.
The training data file is the sample_data.json file from the examples directory, so 20.6kb.
Thank you again.
@davidress-ILW , I am not super familiar with how torch.compile works, but it looks like g++ should be installed for sure. The simplest thing I can recommend is to avoid compilation of a model. Let me know if the issue still persists.
@Ingvarstep Thank you for reply. By commenting out the compiler call, the model completes training and we are thus all set a this end.
I really appreciate you and the others at GLiNER for the help and the software. GLiNER is just to good to be true sometimes. I recently did a blind check on an entity by processing six different web sites - GLiNER medium_v2.1 found 27 of 28 entities which is a 96.4% accuracy. Why in the world did I ever think I needed to fine-tune GLiNER for this entity?