InvalidCxxCompiler exception at 33% of training

Question

InvalidCxxCompiler exception at 33% of training

Closed this issue 3 months ago · 4 comments

I pulled the finetune notebook into a PyCharm project. Created a new venv based off python 3.10. Installed gliner 0.2.7 and then pip installed accelerate -U (0.32.1) like the notebook said. The model is still the small one.

At the 33% training mark, the training crashes with the following exception:

` File "C:\Users\XXXXXX\venv_2024\GlinerFineTuning\lib\site-packages\torch_inductor\codecache.py", line 971, in cpp_compiler_search
raise exc.InvalidCxxCompiler()

torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')`

Is an external compiler required for training? If so, which one?

Preceding the crash, as many torch related warnings like:

`Skipping iteration due to error: backend='inductor' raised:
AssertionError: ((393216,), (512, 768))

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True`

I really have enjoyed GLiNER as it ships, but I need some fine-tuning. Any thoughts?

Answer 1 · 2024-07-10T18:11:31.000Z

Can you make sure that you have g++ installed? Also, did you run a cell that compiles a model? Because I found this related issue pytorch/pytorch#92745. Plus can you clarify how large your dataset is?

Answer 2 · 2024-07-10T18:25:22.000Z

@Ingvarstep Thank you for replying.

I do not have g++ installed. Is g++ required for fine-tuning?

In walking the cpp_compiler_search() method that triggers the exception, there is a check for the platform being "linux" - does this mean I need to be on a linux platform for fine-tuning?

I did execute the model compilation.

I have a very similar stack trace as shown at the pytorch site.

The training data file is the sample_data.json file from the examples directory, so 20.6kb.

Thank you again.

Answer 3 · 2024-07-10T18:36:50.000Z

@davidress-ILW , I am not super familiar with how torch.compile works, but it looks like g++ should be installed for sure. The simplest thing I can recommend is to avoid compilation of a model. Let me know if the issue still persists.

Answer 4 · 2024-07-21T23:44:58.000Z

@Ingvarstep Thank you for reply. By commenting out the compiler call, the model completes training and we are thus all set a this end.

I really appreciate you and the others at GLiNER for the help and the software. GLiNER is just to good to be true sometimes. I recently did a blind check on an entity by processing six different web sites - GLiNER medium_v2.1 found 27 of 28 entities which is a 96.4% accuracy. Why in the world did I ever think I needed to fine-tune GLiNER for this entity?