CUDA kernels from PEFT v0.11.0 breaks C++ compilation

System Info

Who can help?

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

As reported to us by @danielhanchen

the new PEFT 0.11.0 release is breaking llama.cpp / C++ compilation. If you import PEFT, it just breaks C++ compilation - presumably its related to some scripting.
Repro: PEFT 0.10.0 works: https://colab.research.google.com/drive/1vQ4_wUazxvf39wEeN6fxP58xHVaT3Mj8?usp=sharing
PEFT 0.11.0 fails causing gcc to break after importing peft: https://colab.research.google.com/drive/1-NHOoRLISEyisuQqFgUR5L714Fe9sLij?usp=sharing

Ping @yfeng95 @Zeju1997 @YuliangXiu

Expected behavior

We may have to remove the kernels in a patch release if there is no quick solution.

I made a repo to comment out BOFT for now - https://github.com/danielhanchen/peft

And repro which worked after comment it out: https://colab.research.google.com/drive/1Y_MdJnS73hIlR_t2DXgXCgqKVwXHPE82?usp=sharing

I manually added the below to every line and tried isolating the problem:

def install_llama_cpp_blocking(use_cuda = True):
    import subprocess
    import os
    import psutil
    # https://github.com/ggerganov/llama.cpp/issues/7062
    # Weirdly GPU conversion for GGUF breaks??
    # use_cuda = "LLAMA_CUDA=1" if use_cuda else ""

    commands = [
        "git clone --recursive https://github.com/ggerganov/llama.cpp",
        "make clean -C llama.cpp",
        # https://github.com/ggerganov/llama.cpp/issues/7062
        # Weirdly GPU conversion for GGUF breaks??
        # f"{use_cuda} make all -j{psutil.cpu_count()*2} -C llama.cpp",
        f"make all -j{psutil.cpu_count()*2} -C llama.cpp",
        "pip install gguf protobuf",
    ]
    # if os.path.exists("llama.cpp"): return

    for command in commands:
        with subprocess.Popen(command, shell = True, stdout = subprocess.PIPE, stderr = subprocess.STDOUT, bufsize = 1) as sp:
            for line in sp.stdout:
                line = line.decode("utf-8", errors = "replace")
                if "undefined reference" in line:
                    raise RuntimeError("Failed compiling llama.cpp")
                # print(line, flush = True, end = "")
        pass
    pass
pass

Running this Python script reproduces the error on my machine:

import os
import subprocess
from peft import PeftModelForCausalLM

os.chdir("/tmp/")

commands = [
    "git clone --recursive https://github.com/ggerganov/llama.cpp",
    "make clean -C llama.cpp",
    "make all -j4 -C llama.cpp",
    "echo $?",
]

for command in commands:
    with subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1) as sp:
        for line in sp.stdout:
            line = line.decode("utf-8", errors = "replace")
            print(line, end = "")
            if "undefined reference" in line:
                raise RuntimeError("Failed compiling llama.cpp")
    print(f"-------------- finished: {command} --------------")
print("done")

Commenting out these lines seems to fix it for me:

peft/src/peft/tuners/boft/layer.py

Lines 34 to 35 in ae1ae20

    
           os.environ["CC"] = "gcc" 
        
           os.environ["CXX"] = "gcc"