how to get rid of AVX? -DAVX2=OFF doesn't work
Closed this issue · 9 comments
user@gpt4all:$ uname -a$ git clone --recurse-submodules https://github.com/kuvaus/LlamaGPTJ-chat
Linux gpt4all 6.2.16-5-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-6 (2023-07-25T15:33Z) x86_64 x86_64 x86_64 GNU/Linux
user@gpt4all:
cd LlamaGPTJ-chat
Cloning into 'LlamaGPTJ-chat'...
remote: Enumerating objects: 1191, done.
remote: Counting objects: 100% (300/300), done.
remote: Compressing objects: 100% (64/64), done.
remote: Total 1191 (delta 258), reused 250 (delta 233), pack-reused 891
Receiving objects: 100% (1191/1191), 1.09 MiB | 9.24 MiB/s, done.
Resolving deltas: 100% (736/736), done.
Submodule 'llama.cpp' (https://github.com/manyoso/llama.cpp) registered for path 'gpt4all-backend/llama.cpp'
Cloning into '/home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp'...
remote: Enumerating objects: 1977, done.
remote: Counting objects: 100% (542/542), done.
remote: Compressing objects: 100% (31/31), done.
remote: Total 1977 (delta 516), reused 511 (delta 511), pack-reused 1435
Receiving objects: 100% (1977/1977), 2.03 MiB | 6.20 MiB/s, done.
Resolving deltas: 100% (1277/1277), done.
Submodule path 'gpt4all-backend/llama.cpp': checked out '03ceb39c1e729bed4ad1dfa16638a72f1843bf0c'
user@gpt4all:/LlamaGPTJ-chat$ mkdir buid/LlamaGPTJ-chat$ rmdir buid
user@gpt4all:
user@gpt4all:/LlamaGPTJ-chat$ mkdir build/LlamaGPTJ-chat$ cd build
user@gpt4all:
user@gpt4all:/LlamaGPTJ-chat/build$ cmake -DAVX2=OFF ../LlamaGPTJ-chat/build$ cmake --build -DAVX2=OFF . --parallel
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is GNU 12.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/LlamaGPTJ-chat/build
user@gpt4all:
Unknown argument .
Usage: cmake --build
cmake --build --preset [options] [-- [native-options]]
Options: = Project binary directory to be built. --preset , --preset= = Specify a build preset. --list-presets[=] = List available build presets. --parallel [], -j [] = Build in parallel using the given number of jobs. If is omitted the native build tool's default number is used. The CMAKE_BUILD_PARALLEL_LEVEL environment variable specifies a default parallel level when this option is not given. -t ..., --target ... = Build instead of default targets. --config = For multi-configuration tools, choose . --clean-first = Build target 'clean' first, then build. (To clean only, use --target 'clean'.) --resolve-package-references={on|only|off} = Restore/resolve package references during build. -v, --verbose = Enable verbose output - if supported - including the build commands to be executed. -- = Pass remaining options to the native tool. user@gpt4all:~/LlamaGPTJ-chat/build$ cmake --build . --parallel [ 8%] Building C object gpt4all-backend/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q4_1': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1129:27: warning: unused variable 'y' [-Wunused-variable] 1129 | block_q4_1 * restrict y = vy; | ^ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1127:15: warning: unused variable 'nb' [-Wunused-variable] 1127 | const int nb = k / QK4_1; | ^~ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi_f32': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9357:15: warning: unused variable 'ne2_ne3' [-Wunused-variable] 9357 | const int ne2_ne3 = n/ne1; // ne2*ne3 | ^~~~~~~ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi_f16': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9419:15: warning: unused variable 'ne2' [-Wunused-variable] 9419 | const int ne2 = src0->ne[2]; // n_head -> this is k | ^~~ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9468:5: warning: enumeration value 'GGML_TYPE_Q4_3' not handled in switch [-Wswitch] 9468 | switch (src0->type) { | ^~~~~~ [ 8%] Built target ggml [ 16%] Building CXX object gpt4all-backend/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o [ 25%] Linking CXX static library libllama.a [ 25%] Built target llama [ 33%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/gptj.cpp.o [ 41%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llamamodel.cpp.o [ 50%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llmodel_c.cpp.o [ 58%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.o [ 66%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/mpt.cpp.o [ 75%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/utils.cpp.o /home/user/LlamaGPTJ-chat/gpt4all-backend/llmodel_c.cpp: In function 'void* llmodel_model_create(const char*)': /home/user/LlamaGPTJ-chat/gpt4all-backend/llmodel_c.cpp:59:10: warning: ignoring return value of 'size_t fread(void*, size_t, size_t, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result] 59 | fread(&magic, sizeof(magic), 1, f); | ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [ 83%] Linking CXX static library libllmodel.a /usr/bin/ar qc libllmodel.a CMakeFiles/llmodel.dir/gptj.cpp.o CMakeFiles/llmodel.dir/llamamodel.cpp.o CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.o CMakeFiles/llmodel.dir/llmodel_c.cpp.o CMakeFiles/llmodel.dir/mpt.cpp.o CMakeFiles/llmodel.dir/utils.cpp.o /usr/bin/ranlib libllmodel.a [ 83%] Built target llmodel [ 91%] Building CXX object src/CMakeFiles/chat.dir/chat.cpp.o /home/user/LlamaGPTJ-chat/src/chat.cpp: In function 'llmodel_prompt_context load_ctx_from_binary(chatParams&, std::string&)': /home/user/LlamaGPTJ-chat/src/chat.cpp:206:10: warning: ignoring return value of 'size_t fread(void*, size_t, size_t, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result] 206 | fread(&prompt_context, sizeof(prompt_context), 1, file); | ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/user/LlamaGPTJ-chat/src/chat.cpp: In function 'int main(int, char**)': /home/user/LlamaGPTJ-chat/src/chat.cpp:395:21: warning: ignoring return value of 'FILE* freopen(const char*, const char*, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result] 395 | std::freopen("/dev/null", "w", stderr); | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ [100%] Linking CXX executable ../bin/chat [100%] Built target chat user@gpt4all:~/LlamaGPTJ-chat/build$ bin/chat LlamaGPTJ-chat (v. 0.3.0) Your computer does not support AVX1 or AVX2 The program will likely not run. .Segmentation fault user@gpt4all:~/LlamaGPTJ-chat/build$ cd .. user@gpt4all:~/LlamaGPTJ-chat$ rm -fr build user@gpt4all:~/LlamaGPTJ-chat$ mkdir build user@gpt4all:~/LlamaGPTJ-chat$ cd build user@gpt4all:~/LlamaGPTJ-chat/build$ cmake -D AVX2=OFF .. -- The C compiler identification is GNU 12.3.0 -- The CXX compiler identification is GNU 12.3.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- Configuring done -- Generating done -- Build files have been written to: /home/user/LlamaGPTJ-chat/build user@gpt4all:~/LlamaGPTJ-chat/build$ cmake --build . --parallel [ 8%] Building C object gpt4all-backend/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q4_1': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1129:27: warning: unused variable 'y' [-Wunused-variable] 1129 | block_q4_1 * restrict y = vy; | ^ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1127:15: warning: unused variable 'nb' [-Wunused-variable] 1127 | const int nb = k / QK4_1; | ^~ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi_f32': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9357:15: warning: unused variable 'ne2_ne3' [-Wunused-variable] 9357 | const int ne2_ne3 = n/ne1; // ne2*ne3 | ^~~~~~~ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi_f16': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9419:15: warning: unused variable 'ne2' [-Wunused-variable] 9419 | const int ne2 = src0->ne[2]; // n_head -> this is k | ^~~ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9468:5: warning: enumeration value 'GGML_TYPE_Q4_3' not handled in switch [-Wswitch] 9468 | switch (src0->type) { | ^~~~~~ [ 8%] Built target ggml [ 16%] Building CXX object gpt4all-backend/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o [ 25%] Linking CXX static library libllama.a [ 25%] Built target llama [ 33%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/gptj.cpp.o [ 41%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.o [ 50%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llamamodel.cpp.o [ 58%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llmodel_c.cpp.o [ 66%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/mpt.cpp.o [ 75%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/utils.cpp.o /home/user/LlamaGPTJ-chat/gpt4all-backend/llmodel_c.cpp: In function 'void* llmodel_model_create(const char*)': /home/user/LlamaGPTJ-chat/gpt4all-backend/llmodel_c.cpp:59:10: warning: ignoring return value of 'size_t fread(void*, size_t, size_t, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result] 59 | fread(&magic, sizeof(magic), 1, f); | ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [ 83%] Linking CXX static library libllmodel.a /usr/bin/ar qc libllmodel.a CMakeFiles/llmodel.dir/gptj.cpp.o CMakeFiles/llmodel.dir/llamamodel.cpp.o CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.o CMakeFiles/llmodel.dir/llmodel_c.cpp.o CMakeFiles/llmodel.dir/mpt.cpp.o CMakeFiles/llmodel.dir/utils.cpp.o /usr/bin/ranlib libllmodel.a [ 83%] Built target llmodel [ 91%] Building CXX object src/CMakeFiles/chat.dir/chat.cpp.o /home/user/LlamaGPTJ-chat/src/chat.cpp: In function 'llmodel_prompt_context load_ctx_from_binary(chatParams&, std::string&)': /home/user/LlamaGPTJ-chat/src/chat.cpp:206:10: warning: ignoring return value of 'size_t fread(void*, size_t, size_t, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result] 206 | fread(&prompt_context, sizeof(prompt_context), 1, file); | ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/user/LlamaGPTJ-chat/src/chat.cpp: In function 'int main(int, char**)': /home/user/LlamaGPTJ-chat/src/chat.cpp:395:21: warning: ignoring return value of 'FILE* freopen(const char*, const char*, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result] 395 | std::freopen("/dev/null", "w", stderr); | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ [100%] Linking CXX executable ../bin/chat [100%] Built target chat user@gpt4all:~/LlamaGPTJ-chat/build$ bin/chat LlamaGPTJ-chat (v. 0.3.0) Your computer does not support AVX1 or AVX2 The program will likely not run. .Segmentation fault user@gpt4all:~/LlamaGPTJ-chat/build$
I tried -DAVX2=OFF first and then -D AVX2=OFF second
What am I doing wrong?
Hi,
You aren't doing anything wrong. :)
I hardcoded the need for at least avx1 on non ARM processors. (ARM does not support avx)
I just pushed a noavx
branch to remove that. Now -DAVX=OFF
should remove both AVX1 and AVX2.
The following commands should now build:
git clone --recurse-submodules https://github.com/kuvaus/LlamaGPTJ-chat
cd LlamaGPTJ-chat
git checkout noavx
mkdir build
cd build
cmake .. -DAVX=OFF
cmake --build . --parallel
(In above, git checkout noavx
switches to noavx branch and -DAVX=OFF
is the new command)
I hope it now builds. :) I don't have a machine without avx support to test this on so let me know. If it works, I could put it into the main branch too.
By the way:
What kind of machine are you running this on? Is it a very old cpu? I'm just curious.
user@gpt4all:/LlamaGPTJ-chat/build$ cd ../LlamaGPTJ-chat$ cd ..
user@gpt4all:
user@gpt4all:$ rm -fr LlamaGPTJ-chat/$ git clone --recurse-submodules https://github.com/kuvaus/LlamaGPTJ-chat
user@gpt4all:
cd LlamaGPTJ-chat
git checkout noavx
mkdir build
cd build
cmake .. -DAVX=OFF
cmake --build . --parallel
Cloning into 'LlamaGPTJ-chat'...
remote: Enumerating objects: 1194, done.
remote: Counting objects: 100% (286/286), done.
remote: Compressing objects: 100% (64/64), done.
remote: Total 1194 (delta 243), reused 237 (delta 219), pack-reused 908
Receiving objects: 100% (1194/1194), 1.09 MiB | 8.04 MiB/s, done.
Resolving deltas: 100% (738/738), done.
Submodule 'llama.cpp' (https://github.com/manyoso/llama.cpp) registered for path 'gpt4all-backend/llama.cpp'
Cloning into '/home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp'...
remote: Enumerating objects: 1977, done.
remote: Counting objects: 100% (519/519), done.
remote: Compressing objects: 100% (30/30), done.
remote: Total 1977 (delta 494), reused 489 (delta 489), pack-reused 1458
Receiving objects: 100% (1977/1977), 2.03 MiB | 6.07 MiB/s, done.
Resolving deltas: 100% (1277/1277), done.
Submodule path 'gpt4all-backend/llama.cpp': checked out '03ceb39c1e729bed4ad1dfa16638a72f1843bf0c'
branch 'noavx' set up to track 'origin/noavx'.
Switched to a new branch 'noavx'
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is GNU 12.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/LlamaGPTJ-chat/build
[ 8%] Building C object gpt4all-backend/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o
/home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q4_1':
/home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1129:27: warning: unused variable 'y' [-Wunused-variable]
1129 | block_q4_1 * restrict y = vy;
| ^
/home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1127:15: warning: unused variable 'nb' [-Wunused-variable]
1127 | const int nb = k / QK4_1;
| ^~
/home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi_f32':
/home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9357:15: warning: unused variable 'ne2_ne3' [-Wunused-variable]
9357 | const int ne2_ne3 = n/ne1; // ne2ne3
| ^~~~~~~
/home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi_f16':
/home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9419:15: warning: unused variable 'ne2' [-Wunused-variable]
9419 | const int ne2 = src0->ne[2]; // n_head -> this is k
| ^~~
/home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi':
/home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9468:5: warning: enumeration value 'GGML_TYPE_Q4_3' not handled in switch [-Wswitch]
9468 | switch (src0->type) {
| ^~~~~~
[ 8%] Built target ggml
[ 16%] Building CXX object gpt4all-backend/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o
[ 25%] Linking CXX static library libllama.a
[ 25%] Built target llama
[ 33%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/gptj.cpp.o
[ 41%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llamamodel.cpp.o
[ 50%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.o
[ 58%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llmodel_c.cpp.o
[ 66%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/mpt.cpp.o
[ 75%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/utils.cpp.o
/home/user/LlamaGPTJ-chat/gpt4all-backend/llmodel_c.cpp: In function 'void llmodel_model_create(const char*)':
/home/user/LlamaGPTJ-chat/gpt4all-backend/llmodel_c.cpp:59:10: warning: ignoring return value of 'size_t fread(void*, size_t, size_t, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result]
59 | fread(&magic, sizeof(magic), 1, f);
| ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[ 83%] Linking CXX static library libllmodel.a
/usr/bin/ar qc libllmodel.a CMakeFiles/llmodel.dir/gptj.cpp.o CMakeFiles/llmodel.dir/llamamodel.cpp.o CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.o CMakeFiles/llmodel.dir/llmodel_c.cpp.o CMakeFiles/llmodel.dir/mpt.cpp.o CMakeFiles/llmodel.dir/utils.cpp.o
/usr/bin/ranlib libllmodel.a
[ 83%] Built target llmodel
[ 91%] Building CXX object src/CMakeFiles/chat.dir/chat.cpp.o
/home/user/LlamaGPTJ-chat/src/chat.cpp: In function 'llmodel_prompt_context load_ctx_from_binary(chatParams&, std::string&)':
/home/user/LlamaGPTJ-chat/src/chat.cpp:206:10: warning: ignoring return value of 'size_t fread(void*, size_t, size_t, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result]
206 | fread(&prompt_context, sizeof(prompt_context), 1, file);
| ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/user/LlamaGPTJ-chat/src/chat.cpp: In function 'int main(int, char**)':
/home/user/LlamaGPTJ-chat/src/chat.cpp:395:21: warning: ignoring return value of 'FILE* freopen(const char*, const char*, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result]
395 | std::freopen("/dev/null", "w", stderr);
| ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
[100%] Linking CXX executable ../bin/chat
[100%] Built target chat
user@gpt4all:/LlamaGPTJ-chat/build$ bin/chat/LlamaGPTJ-chat/build$
LlamaGPTJ-chat (v. 0.3.0)
Your computer does not support AVX1 or AVX2
The program will likely not run.
.Segmentation fault
user@gpt4all:
my proxmox server
root@pve:# lscpu#
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel
Model name: Intel(R) Xeon(R) CPU X5677 @ 3.47GHz
BIOS Model name: CPU @ 3.4GHz
BIOS CPU family: 179
CPU family: 6
Model: 44
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Stepping: 2
BogoMIPS: 6915.23
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1
gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_
cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid dthe
rm ida arat
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 128 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 1 MiB (4 instances)
L3: 12 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerabilities:
Itlb multihit: KVM: Mitigation: Split huge pages
L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Meltdown: Mitigation; PTI
Mmio stale data: Unknown: No mitigations
Retbleed: Not affected
Spec store bypass: Vulnerable
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Srbds: Not affected
Tsx async abort: Not affected
root@pve:
best I have... :-(
Old Unix guy (started on V7 Unix, before System III) with old hardware
can't stop tinkering
Well at least it builds. :)
Okay, let's see.
1) Model
First I'm just checking that you have the model file downloaded and in place:
(Eliminating the chance that it crashes because it cannot find the model.)
By default it checks that the model is in ./models/ggml-vicuna-13b-1.1-q4_2.bin
but personally I like to use gpt4all groovy for testing because its smaller and faster. To use groovy:
Get it from here 1.3-groovy: ggml-gpt4all-j-v1.3-groovy.bin
And then run it with -m and with path to the model:
./chat -m "/path/to/modelfile/ggml-gpt4all-j-v1.3-groovy.bin"
2) Submodule
It probably still segfaults... so in that case theres another CMakeLists.txt
file in the gpt4all-backend/llama.cpp
folder. I thought the main CMakeLists.txt
would overwrite these settings, but just to make sure you could try to hardcode these settings to OFF
:
file:
LlamaGPTJ-chat/gpt4all-backend/llama.cpp/CMakeLists.txt
lines 54 to 64:
# instruction set specific
option(LLAMA_AVX "llama: enable AVX" OFF)
option(LLAMA_AVX2 "llama: enable AVX2" OFF)
option(LLAMA_AVX512 "llama: enable AVX512" OFF)
option(LLAMA_AVX512_VBMI "llama: enable AVX512-VBMI" OFF)
option(LLAMA_AVX512_VNNI "llama: enable AVX512-VNNI" OFF)
option(LLAMA_FMA "llama: enable FMA" OFF)
# in MSVC F16C is implied with AVX2/AVX512
if (NOT MSVC)
option(LLAMA_F16C "llama: enable F16C" OFF)
endif()
Edit: added that F16C to the noavx branch main CMakeLists.txt too because looks like its a fairly new feature.
If neither of these help, then I need to think about this a little more. I know someone got this running on aarch64 which does not support AVX so in principle it should be possible to get this working on x86 without AVX too. I'm just likely messing something up in the config files...
Old Unix guy (started on V7 Unix, before System III) with old hardware can't stop tinkering
Oh wow! Nice! I'd love to get this working on old hardware too.
user@gpt4all:/LlamaGPTJ-chat/build$ mkdir models/LlamaGPTJ-chat/build$ cd models
user@gpt4all:
user@gpt4all:~/LlamaGPTJ-chat/build/models$ wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin
--2023-08-02 19:50:09-- https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin
Resolving gpt4all.io (gpt4all.io)... 172.67.71.169, 104.26.0.159, 104.26.1.159, ...
Connecting to gpt4all.io (gpt4all.io)|172.67.71.169|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3785248281 (3.5G)
Saving to: 'ggml-gpt4all-j-v1.3-groovy.bin'
ggml-gpt4all-j-v1.3-groo 100%[===============================>] 3.52G 11.2MB/s in 5m 32s
2023-08-02 19:55:46 (10.9 MB/s) - 'ggml-gpt4all-j-v1.3-groovy.bin' saved [3785248281/3785248281]
user@gpt4all:/LlamaGPTJ-chat/build/models$ cd ../LlamaGPTJ-chat/build$ bin/chat -m "models/ggml-gpt4all-j-v1.3-groovy.bin"
user@gpt4all:
LlamaGPTJ-chat (v. 0.3.0)
Your computer does not support AVX1 or AVX2
The program will likely not run.
LlamaGPTJ-chat: loading models/ggml-gpt4all-j-v1.3-groovy.bin
gptj_model_load: loading model from 'models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 2
.Illegal instruction
user@gpt4all:~/LlamaGPTJ-chat/build$
Thanks. This is great info!
Yeah, looks like it really crashes because it still tries to execute some AVX type code that the CPU does not support.
Now I need to think how to debug this...
Made the changes and ran the build
Then grepped to show now where is AVX and ON in the code
chat still died
user@gpt4all:/LlamaGPTJ-chat/build$ cmake --build . --parallel/LlamaGPTJ-chat/build$ grep -R AVX ..|grep ON|grep -v grep|cut -d: -f1|sort|uniq
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/LlamaGPTJ-chat/build
[ 8%] Built target ggml
[ 25%] Built target llama
[ 83%] Built target llmodel
[100%] Built target chat
user@gpt4all:
grep: ../.git/objects/pack/pack-f689ae696470a057dc3c32a35f02651b627d7968.pack: binary file matches
grep: ../build/gpt4all-backend/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o: binary file matches
grep: ../build/gpt4all-backend/llama.cpp/libllama.a: binary file matches
grep: ../build/models/ggml-gpt4all-j-v1.3-groovy.bin: binary file matches
grep: ../build/bin/chat: binary file matches
grep: ../build/src/CMakeFiles/chat.dir/chat.cpp.o: binary file matches
../.github/workflows/cmake-release.yml
../.github/workflows/cmake.yml
../.github/workflows/cmake_branch.yml
../CHANGELOG.md
../README.md
../gpt4all-backend/CMakeLists.txt
../gpt4all-backend/llama.cpp/.github/ISSUE_TEMPLATE/custom.md
../gpt4all-backend/llama.cpp/.github/workflows/build.yml
user@gpt4all:/LlamaGPTJ-chat/build$ bin/chat -m "models/ggml-gpt4all-j-v1.3-groovy.bin"/LlamaGPTJ-chat/build$
LlamaGPTJ-chat (v. 0.3.0)
Your computer does not support AVX1 or AVX2
The program will likely not run.
LlamaGPTJ-chat: loading models/ggml-gpt4all-j-v1.3-groovy.bin
gptj_model_load: loading model from 'models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 2
.Illegal instruction
user@gpt4all:
I did a little more aggressive editing
Changed any option to OFF instead of ON || OFF, made it OFF || OFF
Not ideal for you but for me.....
and got this
user@gpt4all:~/LlamaGPTJ-chat/build$ bin/chat -m "models/ggml-gpt4all-j-v1.3-groovy.bin"
LlamaGPTJ-chat (v. 0.3.0)
Your computer does not support AVX1 or AVX2
The program will likely not run.
LlamaGPTJ-chat: loading models/ggml-gpt4all-j-v1.3-groovy.bin
gptj_model_load: loading model from 'models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 2
.gptj_model_load: ggml ctx size = 5401.45 MB
gptj_model_load: kv self size = 896.00 MB
....... done .............
gptj_model_load: model size = 3609.38 MB / num tensors = 285
LlamaGPTJ-chat: done loading!
hello
Hello! How can I assist you today?
took 2 minutes for the response, but at least it responded
Awesome! You got it working! :)
And yeah its probably going to be super slow without AVX even if you have a fast processor. I know that the speedup between AVX2 and AVX1 is big too.
You could try to enable LLAMA_F16C and maybe LLAMA_FMA but I'm not sure if the CPU supports those.
While debugging:
Things got trickier than I thought. The backend depends on llama.cpp and that ultimately depends on ggml. Now, I found this
discussion on how to run ggml on non-avx supported machines: ggml-org/ggml#25
And just added -Ofast and -march=native to the CMakeLists.txt.
But looks like none of that was needed after all! But maybe those could add speedup because now with those options it compiles to your processor specifics.
I did a little more aggressive editing
Changed any option to OFF instead of ON || OFF, made it OFF || OFF
Not ideal for you but for me.....
It works now. So that is great! :)
Edit: One more thing. Theres a world of difference in speed if the model fits in RAM vs having to swap on disk. So in old machines it might be worth to sudo purge
the memory before starting chat if space is an issue.
Well if you make another version and you want me to test just let me know
Thanks again for making this tool so I can play around with some AI stuff