dblalock/bolt

Old AVX test fails due to GPF/sigsegv

xloem opened this issue · 0 comments

xloem commented

Summary

The avx sgemm test fails at n=14 on my system, apparently due to a general protection fault reading unaligned memory.

Problem Information

_mm256_load_ps raises a general protection fault if an address is passed to it that is not aligned to 32 bytes.
Documentation: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_load_ps&ig_expand=4279

System information

$ cat .git/refs/heads/master
e7726a4c165cc45ac117e9eabd8761013a26640e
$ uname -a
Linux DESKTOP-E3P7TC0 3.10.0-1160.62.1.el7.x86_64 #1 SMP Wed Mar 23 09:04:02 UTC 2022 x86_64 GNU/Linux
$ cat /etc/system-release
Red Hat Enterprise Linux Workstation release 7.9 (Maipo)

Diagnostic Information

$ gdb cpp/bolt-build/bolt
(gdb) run
Starting program: /shared/src/bolt/cpp/bolt-build/bolt 
brute force sgemm test, n = 1...
brute force sgemm test, n = 2...
brute force sgemm test, n = 5...
brute force sgemm test, n = 14...

Program received signal SIGSEGV, Segmentation fault.
0x00000000005a1af9 in _mm256_load_ps (__P=0x886cf8) at /usr/local/lib/gcc/x86_64-pc-linux-gnu/9.4.1/include/avxintrin.h:874
874       return *(__m256 *)__P;
(gdb) up
#1  (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2> (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0, add_to_output=false, A_col_stride=14, B_col_stride=1, 
    out_col_stride=14, nrows_per_chunk=512) at /home/user/src/bolt/cpp/src/utils/avx_utils.hpp:394
394                             sums[mm] = _mm256_load_ps(out_ptr);
(gdb) p out_ptr
$1 = (float *) 0x886cf8
(gdb) bt
#0  0x00000000005a1af9 in _mm256_load_ps (__P=0x886cf8) at /usr/local/lib/gcc/x86_64-pc-linux-gnu/9.4.1/include/avxintrin.h:874
#1  (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2> (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0, add_to_output=false, A_col_stride=14, B_col_stride=1, 
    out_col_stride=14, nrows_per_chunk=512) at /home/user/src/bolt/cpp/src/utils/avx_utils.hpp:394
#2  0x000000000059ff9e in sgemm_colmajor (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0) at /home/user/src/bolt/cpp/src/utils/avx_utils.cpp:18
#3  0x00000000005ee949 in _test_sgemm_colmajor<-1, -1> (N=14, D=1, M=2, simple_entries=false) at /home/user/src/bolt/cpp/test/test_avx_utils.cpp:54
#4  0x00000000005ec3f5 in ____C_A_T_C_H____T_E_S_T____100 () at /home/user/src/bolt/cpp/test/test_avx_utils.cpp:155
#5  0x00000000005bb56e in Catch::FreeFunctionTestCase::invoke (this=0x86ef90) at /home/user/src/bolt/cpp/test/external/catch.hpp:5507
#6  0x00000000005aa337 in Catch::TestCase::invoke (this=0x889280) at /home/user/src/bolt/cpp/test/external/catch.hpp:6389
#7  0x00000000005b972b in Catch::RunContext::runCurrentTest (this=0x7fffffffd560, redirectedCout="", redirectedCerr="") at /home/user/src/bolt/cpp/test/external/catch.hpp:5131
#8  0x00000000005b8737 in Catch::RunContext::runTest (this=0x7fffffffd560, testCase=...) at /home/user/src/bolt/cpp/test/external/catch.hpp:5001
#9  0x00000000005ba095 in Catch::Runner::runTests (this=0x7fffffffd810) at /home/user/src/bolt/cpp/test/external/catch.hpp:5275
#10 0x00000000005bae1b in Catch::Session::run (this=0x7fffffffdb10) at /home/user/src/bolt/cpp/test/external/catch.hpp:5395
#11 0x00000000005bace8 in Catch::Session::run (this=0x7fffffffdb10, argc=1, argv=0x7fffffffdce8) at /home/user/src/bolt/cpp/test/external/catch.hpp:5378
#12 0x00000000005ae8bb in main (argc=1, argv=0x7fffffffdce8) at /home/user/src/bolt/cpp/test/main.cpp:22
$ valgrind cpp/bolt-build/bolt
==5087== Memcheck, a memory error detector                                                     
==5087== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.                       
==5087== Using Valgrind-3.18.0.GIT and LibVEX; rerun with -h for copyright info                
==5087== Command: cpp/bolt-build/bolt                                                          
==5087==                                                                                       
brute force sgemm test, n = 1...                                                               
brute force sgemm test, n = 2...                                                               
brute force sgemm test, n = 5...                                                                                                                                                              
brute force sgemm test, n = 14...                                                                                                                                                             
==5087==                                                                                                                                                                                      
==5087== Process terminating with default action of signal 11 (SIGSEGV)                                                                                                                       
==5087==  General Protection Fault                                                                                                                                                            
==5087==    at 0x5A1AF9: _mm256_load_ps (avxintrin.h:874)                                                                                                                                     
==5087==    by 0x5A1AF9: void (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2>(float const*, float const*, int, int, int, float*, bool, int, int, int, int) (avx_utils.hpp:394)      
==5087==    by 0x59FF9D: sgemm_colmajor(float const*, float const*, int, int, int, float*) (avx_utils.cpp:18)                                                                                 
==5087==    by 0x5EE948: void _test_sgemm_colmajor<-1, -1>(int, int, int, bool) (test_avx_utils.cpp:54)                                                                                       
==5087==    by 0x5EC3F4: ____C_A_T_C_H____T_E_S_T____100() (test_avx_utils.cpp:155)                                                                                                           
==5087==    by 0x5BB56D: Catch::FreeFunctionTestCase::invoke() const (catch.hpp:5507)                                                                                                         
==5087==    by 0x5AA336: Catch::TestCase::invoke() const (catch.hpp:6389)                                                                                                                     
==5087==    by 0x5B972A: Catch::RunContext::runCurrentTest(std::string&, std::string&) (catch.hpp:5131)
==5087==    by 0x5B8736: Catch::RunContext::runTest(Catch::TestCase const&) (catch.hpp:5001)
==5087==    by 0x5BA094: Catch::Runner::runTests() (catch.hpp:5275)
==5087==    by 0x5BAE1A: Catch::Session::run() (catch.hpp:5395)
==5087==    by 0x5BACE7: Catch::Session::run(int, char* const*) (catch.hpp:5378)
==5087==    by 0x5AE8BA: main (main.cpp:22)

Work to Resolve

I pursued this just a little bit before changing tasks. Following the control flow, it looked to me like the unaligned memory arose from passing a matrix with a nonaligned stride greater than 8, to sgemm_colmajor. I've come up with the below so far, but have not yet tested and debugged it and make frequent mistakes so likely something is wrong. The patch is pasted here from a tmux pane and then hand edited, so may need manual application.

diff --git a/cpp/test/test_avx_utils.cpp b/cpp/test/test_avx_utils.cpp
index 4ec4e00..34c318a 100644
--- a/cpp/test/test_avx_utils.cpp
+++ b/cpp/test/test_avx_utils.cpp
@@ -44,6 +44,8 @@ void _test_sgemm_colmajor(int N, int D, int M, bool simple_entries=false) {
         B.setRandom();
     }
     C = (C.array() + -999).matrix();  // value we won't accidentally get
+    int aligned_rows = C.rows() - (C.rows() % (-32 / int(sizeof(float))));
+    C.resize(aligned_rows, C.cols());