koboldcpp having issue with -lcuda path
Closed this issue · 28 comments
hello there, here is output when installing
/nix/store/j2y057vz3i19yh4zjsan1s3q256q15rd-binutils-2.41/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
make: *** [Makefile:575: koboldcpp_cublas] Error 1
error: builder for '/nix/store/ilylx30p0i9yc7q52pd3yicikzbn3m21-koboldcpp-libs-1.61.2.drv' failed with exit code 2
error: 1 dependencies of derivation '/nix/store/il77gqs8iqf6rv61bcakazr1y03qjmc6-koboldcpp-1.61.2.drv' failed to build
error: 1 dependencies of derivation '/nix/store/f3794zyy605g7vwrf2fyqpfp7ynx566n-koboldcpp-1.61.2_fish-completions.drv' failed to build
error: 1 dependencies of derivation '/nix/store/4p5nyifj4r124d1lc6z5z0wn5aj33zij-man-paths.drv' failed to build
error: 1 dependencies of derivation '/nix/store/sbhjqs88si7csnpdckrj7karsnz0mqyi-system-path.drv' failed to build
error: 1 dependencies of derivation '/nix/store/wav4p1brs2wr47270j0drrl6ay4wik2r-nixos-system-nixos-24.05pre604424.d8fe5e6c92d0.drv' failed to build
checked:
https://nixos.wiki/wiki/CUDA
i had to do cuda-fhs.nix way to make the binary working
if you needed help from the maintainers, they offer their help
https://discord.gg/kYSbJAhsgF
https://discord.com/channels/849937185893384223/849937402050379787/1223647072280907828
(all assuming it's not a me issue)
note:
i already have
# cuda support
nixpkgs.config.cudaSupport = true;
enabled
Hello! Thank you for letting me know.
I don't always test all the apps in this repository and given that they are updated automatically, sometimes they break down. Ideally I need to figure out how to build packages with Hydra and test them automatically.
In the mean time I would try to look into this issue, but I don't have a machine that supports cuda though.
hello again, tysm, that would be great :3
so so, in this readme:
https://github.com/LostRuins/koboldcpp?tab=readme-ov-file#considerations
there is a
Since v1.55, lcuda paths on Linux are hardcoded and may require manual changes to the makefile if you do not use koboldcpp.sh for the compilation.
maybe that is the reason it broke in first place, but since i am newbie to nix, didn't tried before in nix, just installed nur
@AtaraxiaSjel KoboldAI dev here (Although not the main Koboldcpp dev). The issue we are having is that the -lcuda location needs to be hardcoded because thats the only thing the makefile supports.
It expects the lib in one of these locations:
- /usr/local/cuda/lib64
-/opt/cuda/lib64
-$(CUDA_PATH)/targets/x86_64-linux/lib
-/usr/local/cuda/targets/aarch64-linux/lib
-/usr/local/cuda/targets/sbsa-linux/lib
-/usr/lib/wsl/lib
For our own build script since its a relative path we also have LLAMA_ADD_CONDA_PATHS=1 which then adds the following two relative paths:
- conda/envs/linux/lib
- conda/envs/linux/lib/stubs
So ./koboldcpp.sh rebuild
will be able to build correctly and thats the method we support the best, but this does pull in an entire micromamba environment. For your nix package mounting / copying the cuda file to one of the expected locations is probably going to work best.
First off, thanks for showing me what the issue is! I don't have much experience with cuda on NixOS or in general...
I have added the path to the cuda stubs in the makefile and applied the addOpenGLRunpath hook to the libraries being built. Now the compilation completes successfully, but I don't have a machine to test with cuda.
If you can, please check if it works now. I published koboldcpp
branch. We can continue discussion here or in this pr #13.
You can check it with something like this:
nix run github:AtaraxiaSjel/nur/koboldcpp#koboldcpp -- --help
and you need this to be enabled indeed:
nixpkgs.config.cudaSupport = true;
hello again, just woke up, apologizes for late response, updated my system to make sure and ran
results:
yep, it works, would be great if it were to be merged in main repo.
nix run github:AtaraxiaSjel/nur/koboldcpp#koboldcpp -- --usecublas --model kukulemon-7B-Q4_K_M-imat.gguf --gpulayers 33 (base)
do you want to allow configuration setting 'extra-substituters' to be set to 'https://ataraxiadev-foss.cachix.org' (y/N)?
do you want to permanently mark this value as untrusted (y/N)?
warning: ignoring untrusted flake configuration setting 'extra-substituters'.
Pass '--accept-flake-config' to trust it
do you want to allow configuration setting 'extra-trusted-public-keys' to be set to 'ataraxiadev-foss.cachix.org-1:ws/jmPRUF5R8TkirnV1b525lP9F/uTBsz2KraV61058=' (y/N)?
do you want to permanently mark this value as untrusted (y/N)?
warning: ignoring untrusted flake configuration setting 'extra-trusted-public-keys'.
Pass '--accept-flake-config' to trust it
***
Welcome to KoboldCpp - Version 1.61.2
Warning: CuBLAS library file not found. Non-BLAS library will be used.
Initializing dynamic library: /nix/store/yf09xvniwggyr80nd2ipvjd50f0llzc2-koboldcpp-1.61.2/lib/koboldcpp_default.so
==========
Namespace(model='kukulemon-7B-Q4_K_M-imat.gguf', model_param='kukulemon-7B-Q4_K_M-imat.gguf', port=5001, port_param=5001, host='', launch=False, config=None, threads=5, usecublas=[], usevulkan=None, useclblast=None, noblas=False, gpulayers=33, tensor_split=None, contextsize=2048, ropeconfig=[0.0, 10000.0], blasbatchsize=512, blasthreads=5, lora=None, smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, onready='', benchmark=None, multiuser=0, remotetunnel=False, highpriority=False, foreground=False, preloadstory='', quiet=False, ssl=None, nocertify=False, sdconfig=None, mmproj='', password=None, ignoremissing=False)
==========
Loading model: /home/nako/Desktop/models/kukulemon-7B-Q4_K_M-imat.gguf
[Threads: 5, BlasThreads: 5, SmartContext: False, ContextShift: True]
The reported GGUF Arch is: llama
---
Identified as GGUF model: (ver 6)
Attempting to Load...
---
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/nako/Desktop/models/kukulemon-7B-Q4_K_M-imat.gguf (version GGUF V3 (latest))
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attm = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = unknown, may not work
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 4.07 GiB (4.83 BPW)
llm_load_print_meta: general.name = D:\Ferramentas\gguf-quantizations\models
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.13 MiB
llm_load_tensors: CPU buffer size = 4165.37 MiB
................................................................................................
Automatic RoPE Scaling: Using (scale:1.000, base:10000.0).
llama_new_context_with_model: n_ctx = 2128
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 266.00 MiB
llama_new_context_with_model: KV self size = 266.00 MiB, K (f16): 133.00 MiB, V (f16): 133.00 MiB
llama_new_context_with_model: CPU output buffer size = 62.50 MiB
llama_new_context_with_model: CPU compute buffer size = 169.16 MiB
llama_new_context_with_model: graph splits: 1
Load Text Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold API on port 5001 at http://localhost:5001/api/
Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
======
Please connect to custom endpoint at http://localhost:5001
^C⏎
off-topic:
just to make sure, dunno me having these will effect the result or not
programs.nix-ld.libraries = with pkgs; [
# Add any missing dynamic libraries for unpackaged programs
# here, NOT in environment.systemPackages
libz
fuse
icu
procps
util-linux
libepoxy.dev
cudatoolkit linuxPackages.nvidia_x11
xorg.libXdmcp xorg.libXtst xorg.libXi xorg.libXmu xorg.libXv xorg.libXrandr
xorg.libX11 xorg.libxcb
zlib
ncurses5
];
### add to shell to make cuda work for binaries
environment.variables = {
CUDA_PATH = "${pkgs.cudatoolkit}";
EXTRA_LDFLAGS = "-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib";
EXTRA_CCFLAGS = "-I/usr/include";
};
wanted to make cuda global and not single shell only
https://nixos.wiki/wiki/CUDA
@DarkReaperBoy
Excellent! Fix is merged to master branch.
off-topic:
just to make sure, dunno me having these will effect the result or not
Honestly, I don't sure too. But addOpenGLRunpath hook should work without settings you provided, only with
nixpkgs.config.cudaSupport = true;
so, all good i think.
hai, installed in configuration.nix, works now, ty again 🙏
WARNING: failed to allocate 266.00 MB of pinned memory: CUDA driver is a stub library
llama_kv_cache_init: CPU KV buffer size = 266.00 MiB
llama_new_context_with_model: KV self size = 266.00 MiB, K (f16): 133.00 MiB, V (f16): 133.00 MiB
WARNING: failed to allocate 62.50 MB of pinned memory: CUDA driver is a stub library
llama_new_context_with_model: CPU output buffer size = 62.50 MiB
WARNING: failed to allocate 169.16 MB of pinned memory: CUDA driver is a stub library
i wonder if this is concerning but everything works fine now
Hm, maybe it don't work after all. Can you check if model is offloading to gpu?
@AtaraxiaSjel hai again, yes, tried and saw it's not offloading at all (inside nvtop) and slow. the binary version + cuda-fhs.nix from
https://nixos.wiki/wiki/CUDA
works butter smooth. anyways though, i was learning home-manager stuff since i just installed nix, so apologizes for not properly testing it. ;-;
@DarkReaperBoy No worries! I'd check it out myself, but I only have a machine with amd gpu (rocm). I will try to fix the issue this week nonetheless.
i'll keep following my notification, take your time!
hmm, i would just like to make kobo noice
324148919_ef8dd0f5_b634_4bed_8142_7f7da87338e0.mp4
Does that mean its working now?
Does that mean its working now?
nope. sucks to be nvidia user ig. binaries +
https://nixos.wiki/wiki/CUDA
still works, so i am not left out.
Hi! Sorry for the delayed response!
For the purposes mentioned here (NixOS/nixpkgs#217780), we need to apply autoAddDriverRunpath (formerly autoAddOpenGLRunpathHook) hook to load CUDA libraries at runtime from the /run/opengl-driver/lib
directory.
In previous version of derivation, I had already applied this hook. However, it did solve the issue, and koboldcpp was still unable to load these libraries, resulting in the error "CUDA driver is a stub library."
I investigated the problem further and noticed that in the koboldcpp-libs derivation, koboldcpp_cublas.so
contained /run/opengl-driver/lib
in its RPATH. However, after copying this library to the koboldcpp derivation, /run/opengl-driver/lib
disappeared from the RPATH.
Upon closer inspection, I found that stdenv.mkDerivation uses the patchElf
hook in the fixupPhase of the koboldcpp derivation to shrink the RPATHs of ELF executables. To prevent this from happening, I added dontPatchELF = true;
to the koboldcpp derivation. Now, koboldcpp_cublas.so
retains /run/opengl-driver/lib
in its RPATH.
Before the change:
> patchelf --print-rpath result/lib/koboldcpp_cublas.so
/nix/store/mj0z77zqa6kkrm8k54d0qhwscvizyacj-cuda_cudart-12.2.140-lib/lib/stubs:/nix/store/mj0z77zqa6kkrm8k54d0qhwscvizyacj-cuda_cudart-12.2.140-lib/lib:/nix/store/lig5zg0ls4a64f2364cfdfwp3k19nhqy-libcublas-12.2.5.6-lib/lib:/nix/store/35pq4hr29c3sl79lgfwgsvd9nwzyp4am-glibc-2.39-5/lib:/nix/store/f1ii69v7p27z1r5zybmlbld3bdzm6a5f-gcc-13.2.0-lib/lib
After the change:
> patchelf --print-rpath result/lib/koboldcpp_cublas.so
/run/opengl-driver/lib:/nix/store/mj0z77zqa6kkrm8k54d0qhwscvizyacj-cuda_cudart-12.2.140-lib/lib/stubs:/nix/store/mj0z77zqa6kkrm8k54d0qhwscvizyacj-cuda_cudart-12.2.140-lib/lib:/nix/store/lig5zg0ls4a64f2364cfdfwp3k19nhqy-libcublas-12.2.5.6-lib/lib:/nix/store/35pq4hr29c3sl79lgfwgsvd9nwzyp4am-glibc-2.39-5/lib:/nix/store/f1ii69v7p27z1r5zybmlbld3bdzm6a5f-gcc-13.2.0-lib/lib
I hope this should fix the issue. The koboldcpp branch has been updated, and I will merge it into main if you can confirm that this fixes it.
works flawlessly at quick test, it's fine for main branch pr, ty. 🙏
@DarkReaperBoy done! Thanks for testing 👍🏻
@DarkReaperBoy done! Thanks for testing 👍🏻
hai, i am really sowy, tonight when i was using it, saw that is slow, than realized:
the cuda isn't even detected. i really apologize for not properly testing again. i think last time i tested though had cuda which is weird...
ik it's my fault but ty anyways.
@DarkReaperBoy No worries!
Anyway, if you can I want to solve this issue.
Can you provide command, how you load your model? And show lib directory from derivation? Path from third string on screenshot.
well how it runs is:
Welcome to KoboldCpp - Version 1.61.2
Attempting to use CuBLAS library for faster prompt ingestion. A compatible CuBLAS will be required.
Initializing dynamic library: /nix/store/fb5456a6znbzh1fs0p8r7schqg155zqm-koboldcpp-1.61.2/lib/koboldcpp_cublas.so
==========
Namespace(model='llama-3-lewdplay-8b-evo.Q4_K_M.gguf', model_param='llama-3-lewdplay-8b-evo.Q4_K_M.gguf', port=5001, port_param=5001, host='', launch=False, config=None, threads=12, usecublas=[], usevulkan=None, useclblast=None, noblas=False, gpulayers=33, tensor_split=None, contextsize=2048, ropeconfig=[0.0, 10000.0], blasbatchsize=512, blasthreads=12, lora=None, smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, onready='', benchmark=None, multiuser=0, remotetunnel=False, highpriority=False, foreground=False, preloadstory='', quiet=False, ssl=None, nocertify=False, sdconfig=None, mmproj='', password=None, ignoremissing=False)
==========
Loading model: /home/nako/Desktop/models/llama-3-lewdplay-8b-evo.Q4_K_M.gguf
[Threads: 12, BlasThreads: 12, SmartContext: False, ContextShift: True]
The reported GGUF Arch is: llama
---
Identified as GGUF model: (ver 6)
Attempting to Load...
---
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
ggml_init_cublas: no CUDA devices found, CUDA will be disabled
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /home/nako/Desktop/models/llama-3-lewdplay-8b-evo.Q4_K_M.gguf (version GGUF V3 (latest))
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attm = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = unknown, may not work
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 4.58 GiB (4.89 BPW)
llm_load_print_meta: general.name = Llama-3-LewdPlay-8B-evo
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_tensors: ggml ctx size = 0.13 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size = 4685.30 MiB
........................................................................................
Automatic RoPE Scaling: Using model internal value.
llama_new_context_with_model: n_ctx = 2128
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
WARNING: failed to allocate 266.00 MB of pinned memory: CUDA driver is a stub library
llama_kv_cache_init: CPU KV buffer size = 266.00 MiB
llama_new_context_with_model: KV self size = 266.00 MiB, K (f16): 133.00 MiB, V (f16): 133.00 MiB
WARNING: failed to allocate 250.50 MB of pinned memory: CUDA driver is a stub library
llama_new_context_with_model: CPU output buffer size = 250.50 MiB
WARNING: failed to allocate 258.50 MB of pinned memory: CUDA driver is a stub library
llama_new_context_with_model: CUDA_Host compute buffer size = 258.50 MiB
llama_new_context_with_model: graph splits: 1
Load Text Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold API on port 5001 at http://localhost:5001/api/
Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
======
Please connect to custom endpoint at http://localhost:5001
now that i think, i have a suspension on:
### add to shell to make cuda work for binaries
environment.variables = {
CUDA_PATH = "${pkgs.cudatoolkit}";
EXTRA_LDFLAGS = "-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib";
EXTRA_CCFLAGS = "-I/usr/include";
DOTNET_SYSTEM_GLOBALIZATION_PREDEFINED_CULTURES_ONLY= "false";
DOTNET_SYSTEM_GLOBALIZATION_INVARIANT = "1";
NIXPKGS_ALLOW_UNFREE = "1";
};
part of my config, i'll comment it to see what will happen
no, commenting
CUDA_PATH = "${pkgs.cudatoolkit}";
EXTRA_LDFLAGS = "-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib";
didn't fix repo version and broke the binary version as well, also, i have no idea about
And show lib directory from derivation? Path from third string on screenshot.
@DarkReaperBoy Can you provide command how you starting koboldcpp? Something like
koboldcpp --contextsize 8192 --usecublas normal mmq --gpulayers 99 --model models/dolphin-2.5-mixtral-8x7b.Q4_K_M.gguf
What this commands would print for you?
ls -lah /nix/store/fb5456a6znbzh1fs0p8r7schqg155zqm-koboldcpp-1.61.2/lib
and
patchelf --print-rpath /nix/store/fb5456a6znbzh1fs0p8r7schqg155zqm-koboldcpp-1.61.2/lib/koboldcpp_cublas.so
@AtaraxiaSjel hello, i use
koboldcpp --usecublas --threads 12 --model llama-3-lewdplay-8b-evo.Q4_K_M.gguf --gpulayers 33
as for the second question it does:
nako@nixos ~/D/models> ls -lah /nix/store/fb5456a6znbzh1fs0p8r7schqg155zqm-koboldcpp-1.61.2/lib (base)
total 47M
dr-xr-xr-x 2 root root 4.0K Jan 1 1970 .
dr-xr-xr-x 5 root root 4.0K Jan 1 1970 ..
-r--r--r-- 1 root root 1.6M Jan 1 1970 kcpp_docs.embd
-r--r--r-- 1 root root 738K Jan 1 1970 klite.embd
-r-xr-xr-x 1 root root 29M Jan 1 1970 koboldcpp_cublas.so
-r-xr-xr-x 1 root root 5.0M Jan 1 1970 koboldcpp_default.so
-r-xr-xr-x 1 root root 5.2M Jan 1 1970 koboldcpp_failsafe.so
-r-xr-xr-x 1 root root 5.2M Jan 1 1970 koboldcpp_noavx2.so
-r--r--r-- 1 root root 398K Jan 1 1970 rwkv_vocab.embd
-r--r--r-- 1 root root 794K Jan 1 1970 rwkv_world_vocab.embd
the third one wasn't installed so i did:
nix-shell -p patchelf
and the output is:
nako@nixos ~/D/models> patchelf --print-rpath /nix/store/fb5456a6znbzh1fs0p8r7schqg155zqm-koboldcpp-1.61.2/lib/koboldcpp_cublas.so
/nix/store/ibsml62bca7zlx80cfwf4mjpqzgm14lc-cuda_cudart-12.2.140-lib/lib/stubs:/nix/store/ibsml62bca7zlx80cfwf4mjpqzgm14lc-cuda_cudart-12.2.140-lib/lib:/nix/store/8lc4iisqw0lajd8lbjwdbiywrlzkg8hb-libcublas-12.2.5.6-lib/lib:/nix/store/1rm6sr6ixxzipv5358x0cmaw8rs84g2j-glibc-2.38-44/lib:/nix/store/agp6lqznayysqvqkx4k1ggr8n1rsyi8c-gcc-13.2.0-lib/lib
@DarkReaperBoy
It seems that you may have an older version of my nur repo. You're using koboldcpp v1.61.2, but after I fixed the problem, the repo has been updated to v1.63, and now it's at v1.64.
As a result, koboldcpp_cublas.so doesn't have the correct path to the CUDA libraries in its RPATH.
How are you using this repo? Maybe you forgot to switch the branch back to main from koboldcpp? Try updating the repo, or run koboldcpp like this:
nix run github:AtaraxiaSjel/nur/koboldcpp#koboldcpp -- --usecublas <other flags>
If this would not help you, try to build koboldcpp
nix build github:AtaraxiaSjel/nur/koboldcpp#koboldcpp
and print rpaths like this:
patchelf --print-rpath result/lib/koboldcpp_cublas.so
And sorry to bother you, I don't have an nvidia gpu, so I can't really test or debug this myself :)
hello again. it's me who didn't test properly and take time. i sincerely apologize. i am really am glad to help you. so... i am using nixos-unstable really find it weird that
sudo nixos-rebuild switch --upgrade
doesn't update anything for a while. maybe it's a wrong command and i should do it along with
sudo nix-channel --update && sudo nix-collect-garbage -d
either way, i did both, will figure out someday. now to the issue, why i have brought up is that maybe it's related? so, running the first command gives:
nako@nixos ~/D/models> nix run github:AtaraxiaSjel/nur/koboldcpp#koboldcpp -- --usecublas --threads 12 --model Meta-Llama-3-8B-Instruct.Q4_K_S.gguf --gpulayers 33
do you want to allow configuration setting 'extra-substituters' to be set to 'https://ataraxiadev-foss.cachix.org' (y/N)?
do you want to permanently mark this value as untrusted (y/N)?
warning: ignoring untrusted flake configuration setting 'extra-substituters'.
Pass '--accept-flake-config' to trust it
do you want to allow configuration setting 'extra-trusted-public-keys' to be set to 'ataraxiadev-foss.cachix.org-1:ws/jmPRUF5R8TkirnV1b525lP9F/uTBsz2KraV61058=' (y/N)?
do you want to permanently mark this value as untrusted (y/N)?
warning: ignoring untrusted flake configuration setting 'extra-trusted-public-keys'.
Pass '--accept-flake-config' to trust it
***
Welcome to KoboldCpp - Version 1.63
Warning: CuBLAS library file not found. Non-BLAS library will be used.
Initializing dynamic library: /nix/store/g700v6k453k05abckmhy8lbzs0vj6bih-koboldcpp-1.63/lib/koboldcpp_default.so
==========
Namespace(model='Meta-Llama-3-8B-Instruct.Q4_K_S.gguf', model_param='Meta-Llama-3-8B-Instruct.Q4_K_S.gguf', port=5001, port_param=5001, host='', launch=False, config=None, threads=12, usecublas=[], usevulkan=None, useclblast=None, noblas=False, gpulayers=33, tensor_split=None, contextsize=2048, ropeconfig=[0.0, 10000.0], blasbatchsize=512, blasthreads=12, lora=None, smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, onready='', benchmark=None, multiuser=0, remotetunnel=False, highpriority=False, foreground=False, preloadstory='', quiet=False, ssl=None, nocertify=False, sdconfig=None, mmproj='', password=None, ignoremissing=False, chatcompletionsadapter='')
==========
Loading model: /home/nako/Desktop/models/Meta-Llama-3-8B-Instruct.Q4_K_S.gguf
[Threads: 12, BlasThreads: 12, SmartContext: False, ContextShift: True]
The reported GGUF Arch is: llama
---
Identified as GGUF model: (ver 6)
Attempting to Load...
---
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/nako/Desktop/models/Meta-Llama-3-8B-Instruct.Q4_K_S.gguf (version GGUF V3 (latest))
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = unknown, may not work
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 4.36 GiB (4.67 BPW)
llm_load_print_meta: general.name = .
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_tensors: ggml ctx size = 0.17 MiB
llm_load_tensors: CPU buffer size = 4467.80 MiB
.......................................................................................
Automatic RoPE Scaling: Using (scale:1.000, base:500000.0).
llama_new_context_with_model: n_ctx = 2144
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 268.00 MiB
llama_new_context_with_model: KV self size = 268.00 MiB, K (f16): 134.00 MiB, V (f16): 134.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.49 MiB
llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1
Load Text Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold API on port 5001 at http://localhost:5001/api/
Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
======
Please connect to custom endpoint at http://localhost:5001
and it is not even version v1.64 strangely. as for
How are you using this repo? Maybe you forgot to switch the branch back to main from koboldcpp?
so i did add nur to my flake like this:
{
inputs = {
nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
home-manager.url = "github:nix-community/home-manager";
home-manager.inputs.nixpkgs.follows = "nixpkgs";
nix-flatpak.url = "github:gmodena/nix-flatpak";
nur.url = "github:nix-community/NUR";
};
outputs = { self, nixpkgs, nix-flatpak, home-manager, nur, ... }@inputs:
let
system = "x86_64-linux";
username = "nako";
pkgs = nixpkgs.legacyPackages.${system};
in {
nixosConfigurations.nixos = nixpkgs.lib.nixosSystem {
specialArgs = { inherit inputs; };
modules = [
nur.nixosModules.nur
home-manager.nixosModules.home-manager
nix-flatpak.nixosModules.nix-flatpak
./configuration.nix
];
};
};
}
then added kobo to the "environment.systemPackages" (main place to install stuff as system application) with "config.nur.repos.ataraxiasjel.koboldcpp". (i would gladly share my configuration.nix if needed). that is how i use.
nix build github:AtaraxiaSjel/nur/koboldcpp#koboldcpp
did nothing. and lastly:
nako@nixos ~/D/models> patchelf --print-rpath result/lib/koboldcpp_cublas.so (base)
patchelf: getting info about 'result/lib/koboldcpp_cublas.so': No such file or directory
let me share my config just in case (changed to txt because github doesn't allow.).
Hello! Apologies for the delay - I completely forgot about this issue. However, koboldcpp has been available in nixpkgs for a long time. I’ve removed the koboldcpp package from this NUR repo but retained koboldcpp-rocm here. I’ll close this issue now, but thank you for your help and participation earlier!
❤