susilehtola/erkale

ctest fail with gcc 6.4.0 intel mkl 2018 Ubuntu 18.04

pistack opened this issue · 17 comments

When I compile erkale with gcc 6.4.0 intel mkl, armadillo(only header), libint 1.1.6, gsl 2.5 and libxc 4.0.4, 100 tests failed because of child aborted.

Here is ctest log.

The following tests FAILED:
6 - H_dz (Child aborted)
8 - He_dz (Child aborted)
10 - H2_dz (Child aborted)
12 - H2_lda_dz (Child aborted)
14 - H2_pbe_dz (Child aborted)
18 - Be+_dz (Child aborted)
26 - Mg+_dz (Child aborted)
32 - Ne_tpss_tz (Child aborted)
38 - Mg+_lda_tz (Child aborted)
42 - Mg+_pbe_tz (Child aborted)
44 - Mg+_rohf_tz (Child aborted)
46 - Mg_tpss_tz (Child aborted)
48 - Mg+_tpss_tz (Child aborted)
52 - Mg+_tz (Child aborted)
60 - H2O_tz (Child aborted)
62 - H2O_tz_dir (Child aborted)
64 - H2O_tz_hf (Child aborted)
66 - H2O+_tz_hf (Child aborted)
68 - H2O_tz_pbe (Child aborted)
70 - H2O_tz_pbe_dir (Child aborted)
71 - H2O_tz_pbe_fit_chk (Child aborted)
72 - H2O_tz_pbe_fit (Child aborted)
73 - H2O_tz_pbe_fitdir_chk (Child aborted)
74 - H2O_tz_pbe_fitdir (Child aborted)
76 - H2O+_tz_rohf (Child aborted)
78 - H2O_tz_tpss (Child aborted)
82 - Mg+_qz (Child aborted)
84 - H2O_tz_pbe_cart (Child aborted)
88 - water_dimer_b3lyp (Child aborted)
90 - water_dimer_b97m_v (Child aborted)
92 - water_dimer_lda (Child aborted)
94 - water_dimer_pbe (Child aborted)
96 - water_dimer_pbe0 (Child aborted)
97 - water_dimer_rihf_chk (Child aborted)
98 - water_dimer_rihf (Child aborted)
100 - water_dimer_tpssh (Child aborted)
102 - water_dimer_wb97 (Child aborted)
104 - water_dimer_wb97x (Child aborted)
106 - water_dimer_wb97x_v (Child aborted)
107 - water_dimer_b3lyp_fch_chk (Child aborted)
108 - water_dimer_b3lyp_fch (Child aborted)
109 - water_dimer_b3lyp_xch_chk (Child aborted)
110 - water_dimer_b3lyp_xch (Child aborted)
111 - water_dimer_b3lyp_xrs_chk (Child aborted)
112 - water_dimer_b3lyp_xrs (Child aborted)
113 - water_dimer_lda_fch_chk (Child aborted)
114 - water_dimer_lda_fch (Child aborted)
115 - water_dimer_lda_xch_chk (Child aborted)
116 - water_dimer_lda_xch (Child aborted)
117 - water_dimer_lda_xrs_chk (Child aborted)
118 - water_dimer_lda_xrs (Child aborted)
119 - water_dimer_pbe0_fch_chk (Child aborted)
120 - water_dimer_pbe0_fch (Child aborted)
121 - water_dimer_pbe0_xch_chk (Child aborted)
122 - water_dimer_pbe0_xch (Child aborted)
123 - water_dimer_pbe0_xrs_chk (Child aborted)
124 - water_dimer_pbe0_xrs (Child aborted)
125 - water_dimer_pbe_fch_chk (Child aborted)
126 - water_dimer_pbe_fch (Child aborted)
127 - water_dimer_pbe_xch_chk (Child aborted)
128 - water_dimer_pbe_xch (Child aborted)
129 - water_dimer_pbe_xrs_chk (Child aborted)
130 - water_dimer_pbe_xrs (Child aborted)
131 - water_dimer_tpssh_fch_chk (Child aborted)
132 - water_dimer_tpssh_fch (Child aborted)
133 - water_dimer_tpssh_xch_chk (Child aborted)
134 - water_dimer_tpssh_xch (Child aborted)
135 - water_dimer_tpssh_xrs_chk (Child aborted)
136 - water_dimer_tpssh_xrs (Child aborted)
137 - water_dimer_wb97_fch_chk (Child aborted)
138 - water_dimer_wb97_fch (Child aborted)
139 - water_dimer_wb97_xch_chk (Child aborted)
140 - water_dimer_wb97_xch (Child aborted)
141 - water_dimer_wb97x_fch_chk (Child aborted)
142 - water_dimer_wb97x_fch (Child aborted)
143 - water_dimer_wb97_xrs_chk (Child aborted)
144 - water_dimer_wb97_xrs (Child aborted)
145 - water_dimer_wb97x_xch_chk (Child aborted)
146 - water_dimer_wb97x_xch (Child aborted)
147 - water_dimer_wb97x_xrs_chk (Child aborted)
148 - water_dimer_wb97x_xrs (Child aborted)
150 - Cdcplx_b3lyp (Child aborted)
152 - Cdcplx_hf (Child aborted)
154 - H2O_atz_pbe (Child aborted)
156 - H2O_qz (Child aborted)
158 - H2O_qz_dir (Child aborted)
159 - water_cluster_chk (Child aborted)
160 - water_cluster (Child aborted)
162 - water_cluster_cholesky (Child aborted)
163 - water_cluster_rihf_chk (Child aborted)
164 - water_cluster_rihf (Child aborted)
165 - water_cluster_fch_chk (Child aborted)
166 - water_cluster_fch (Child aborted)
167 - water_cluster_xch_chk (Child aborted)
168 - water_cluster_xch (Child aborted)
169 - water_cluster_xrs_chk (Child aborted)
170 - water_cluster_xrs (Child aborted)
172 - water_dimer_hf (Child aborted)
173 - decanol_chk (Child aborted)
174 - decanol (Child aborted)
Errors while running CTest
Makefile:94: recipe for target 'test' failed
make: *** [test] Error 8

You're not giving me much to work with. What does

$ ctest -V

say?

Also, is this a clean compilation from scratch? If not, remove the build directories (serial/ and openmp/ if you're using my script), and start afresh.

If some of the object files have been compiled with an older version of HDF5, the HDF5 library will complain and crash the program.

I did notice that there were several things wrong in the tests, since the defaults have changed since the tests were generated. I also found some other minor issues preventing the tests from running as they should.

It seems like there are still some tests that fail even on my computer, but unfortunately I don't have time to look into it now.

Thank you for your kind reply. Now, I just 11 tests failed. Are these tests failed on your computer too?
Thank you.

8:He_dz
10:H2_dz
99:water_dimer_rihf_chk
100:water_dimer_rihf
134:water_dimer_tpssh_fch
136:water_dimer_tpssh_xch
138:water_dimer_tpssh_xrs
167:water_cluster_fch_chk
168:water_cluster_fch
175:decanol_chk
176:decanol
LastTest.log

Yes, looks about right.

All test should work now, please confirm.

Thank you for your great work.
Now, Just two test fail.
152 - Cdcplx_b3lyp (Child aborted)
154 - Cdcplx_hf (Child aborted)

Here are test logs for failed two tests.

152
152: Test command: /home/list1331/Downloads/erkale/src/test/chkcompare_omp "Cdcplx_b3lyp.chk"
152: Environment variables:
152: ERKALE_REFDIR=/home/list1331/Downloads/erkale/refdata
152: Test timeout computed to be: 9.99988e+06
152: Reference directory is set to "/home/list1331/Downloads/erkale/refdata".
152: terminate called after throwing an instance of 'std::runtime_error'
152: what(): Basis sets don't match!
152:
152/176 Test #152: Cdcplx_b3lyp .....................***Exception: Child aborted 0.37 sec

154
154: Test command: /home/list1331/Downloads/erkale/src/test/chkcompare_omp "Cdcplx_hf.chk"
154: Environment variables:
154: ERKALE_REFDIR=/home/list1331/Downloads/erkale/refdata
154: Test timeout computed to be: 9.99988e+06
154: Reference directory is set to "/home/list1331/Downloads/erkale/refdata".
154: terminate called after throwing an instance of 'std::runtime_error'
154: what(): Basis sets don't match!
154:
154/176 Test #154: Cdcplx_hf ........................***Exception: Child aborted 0.23 sec

Thank you.

Hmh that's weird. What architecture is this?

I compiled erkale in ubuntu 18.04 and intel core i5-6200U cpu with intel compiler 2019. Also gsl, libint and hdf5 1.10.4 were compiled by intel compiler, too.

Not sure if it is the same issue, but I'm also getting test failures, although for a different configuration.

⮕ # gcc --version
gcc (Gentoo 8.2.0-r6 p1.7) 8.2.0
⮕ # cmake --version
cmake version 3.13.2
⮕ # eix -e libxc
[I] sci-libs/libxc
     Installed versions:  3.0.0(17:55:45 2018.12.17.)(fortran -static-libs -test)
⮕ # eix -e libint
[I] sci-libs/libint
     Installed versions:  1.1.6(1)(16:32:24 2019.01.02.)(-static-libs) 2.0.5(2)(16:36:41 2019.01.02.)(-doc -static-libs)
⮕ # eix -e hdf5
[I] sci-libs/hdf5
     Installed versions:  1.10.1(0/1.10.1)(18:00:37 2018.12.17.)(cxx fortran hl szip zlib -debug -examples -mpi -static-libs -threads)
⮕ # eix -e gsl
[I] sci-libs/gsl
     Installed versions:  2.5(0/23)[1](14:08:46 2018.10.28.)(-cblas-external -static-libs ABI_MIPS="-n32 -n64 -o32" ABI_PPC="-32 -64" ABI_S390="-32 -64" ABI_X86="64 -32 -x32")
⮕ # eix -e openblas
[I] sci-libs/openblas [1]
     Installed versions:  0.2.20(21:42:33 2019.01.02.)(-dynamic -int64 -openmp -static-libs -threads ABI_MIPS="-n32 -n64 -o32" ABI_PPC="-32 -64" ABI_S390="-32 -64" ABI_X86="64 -32 -x32")
⮕ # eix -e lapack-reference
[I] sci-libs/lapack-reference
     Installed versions:  3.8.0-r100(0/3.8.0)[1](19:36:58 2019.01.02.)(deprecated -int64 -static-libs -test -xblas ABI_MIPS="-n32 -n64 -o32" ABI_PPC="-32 -64" ABI_S390="-32 -64" ABI_X86="64 -32 -x32")
⮕ # eix -e armadillo
[I] sci-libs/armadillo
     Installed versions:  9.200.6(0/9)(12:40:20 2018.12.26.)(blas lapack -arpack -doc -examples -hdf5 -mkl -superlu -test)
The following tests FAILED:
         32 - Ne_tpss_tz (Child aborted)
         46 - Mg_tpss_tz (Child aborted)
         48 - Mg+_tpss_tz (Child aborted)
         78 - H2O_tz_tpss (Child aborted)
        102 - water_dimer_tpssh (Child aborted)
        134 - water_dimer_tpssh_fch (Child aborted)
        136 - water_dimer_tpssh_xch (Child aborted)
        138 - water_dimer_tpssh_xrs (Child aborted)
Errors while running CTest
32/176 Testing: Ne_tpss_tz
32/176 Test: Ne_tpss_tz
Command: "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/src/test/chkcompare" "Ne_tpss_tz.chk"
Directory: /var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/tests
"Ne_tpss_tz" start time: Jan 03 01:00 EET
Output:
----------------------------------------------------------
Reference directory is set to "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999/refdata".
Total energy difference 2.017742e-09
Electron count difference -5.329071e-15
terminate called after throwing an instance of 'std::runtime_error'
  what():  Density matrices differ!

<end of output>
Test time =   2.28 sec
----------------------------------------------------------
Test Failed.
"Ne_tpss_tz" end time: Jan 03 01:00 EET
"Ne_tpss_tz" time elapsed: 00:00:02
----------------------------------------------------------

46/176 Testing: Mg_tpss_tz
46/176 Test: Mg_tpss_tz
Command: "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/src/test/chkcompare" "Mg_tpss_tz.chk"
Directory: /var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/tests
"Mg_tpss_tz" start time: Jan 03 01:00 EET
Output:
----------------------------------------------------------
Reference directory is set to "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999/refdata".
Total energy difference 1.616201e-09
Electron count difference -5.329071e-15
terminate called after throwing an instance of 'std::runtime_error'
  what():  Density matrices differ!

<end of output>
Test time =   0.21 sec
----------------------------------------------------------
Test Failed.
"Mg_tpss_tz" end time: Jan 03 01:00 EET
"Mg_tpss_tz" time elapsed: 00:00:00
----------------------------------------------------------

48/176 Testing: Mg+_tpss_tz
48/176 Test: Mg+_tpss_tz
Command: "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/src/test/chkcompare" "Mg+_tpss_tz.chk"
Directory: /var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/tests
"Mg+_tpss_tz" start time: Jan 03 01:00 EET
Output:
----------------------------------------------------------
Reference directory is set to "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999/refdata".
Total energy difference -8.219446e-06
Alpha electron count difference -5.329071e-15
Beta  electron count difference -5.329071e-15
Total electron count difference -5.329071e-15
Alpha density matrix difference 1.649677e-05
terminate called after throwing an instance of 'std::runtime_error'
  what():  Alpha density matrices differ!

<end of output>
Test time =   0.23 sec
----------------------------------------------------------
Test Failed.
"Mg+_tpss_tz" end time: Jan 03 01:00 EET
"Mg+_tpss_tz" time elapsed: 00:00:00
----------------------------------------------------------

78/176 Testing: H2O_tz_tpss
78/176 Test: H2O_tz_tpss
Command: "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/src/test/chkcompare" "H2O_tz_tpss.chk"
Directory: /var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/tests
"H2O_tz_tpss" start time: Jan 03 01:00 EET
Output:
----------------------------------------------------------
Reference directory is set to "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999/refdata".
Total energy difference 1.836153e-08
Electron count difference 1.776357e-15
terminate called after throwing an instance of 'std::runtime_error'
  what():  Density matrices differ!

<end of output>
Test time =   0.21 sec
----------------------------------------------------------
Test Failed.
"H2O_tz_tpss" end time: Jan 03 01:00 EET
"H2O_tz_tpss" time elapsed: 00:00:00
----------------------------------------------------------

102/176 Testing: water_dimer_tpssh
102/176 Test: water_dimer_tpssh
Command: "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/src/test/chkcompare" "water_dimer_tpssh.chk"
Directory: /var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/tests
"water_dimer_tpssh" start time: Jan 03 01:00 EET
Output:
----------------------------------------------------------
Reference directory is set to "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999/refdata".
Total energy difference 2.543729e-08
Electron count difference 2.842171e-14
terminate called after throwing an instance of 'std::runtime_error'
  what():  Density matrices differ!

<end of output>
Test time =   2.12 sec
----------------------------------------------------------
Test Failed.
"water_dimer_tpssh" end time: Jan 03 01:00 EET
"water_dimer_tpssh" time elapsed: 00:00:02
----------------------------------------------------------

134/176 Testing: water_dimer_tpssh_fch
134/176 Test: water_dimer_tpssh_fch
Command: "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/src/test/chkcompare" "water_dimer_tpssh_fch.chk"
Directory: /var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/tests
"water_dimer_tpssh_fch" start time: Jan 03 01:01 EET
Output:
----------------------------------------------------------
Reference directory is set to "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999/refdata".
Total energy difference -3.497619e-04
terminate called after throwing an instance of 'std::runtime_error'
  what():  Total energies don't match!

<end of output>
Test time =   0.22 sec
----------------------------------------------------------
Test Failed.
"water_dimer_tpssh_fch" end time: Jan 03 01:01 EET
"water_dimer_tpssh_fch" time elapsed: 00:00:00
----------------------------------------------------------

136/176 Testing: water_dimer_tpssh_xch
136/176 Test: water_dimer_tpssh_xch
Command: "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/src/test/chkcompare" "water_dimer_tpssh_xch.chk"
Directory: /var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/tests
"water_dimer_tpssh_xch" start time: Jan 03 01:01 EET
Output:
----------------------------------------------------------
Reference directory is set to "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999/refdata".
Total energy difference -3.374064e-04
terminate called after throwing an instance of 'std::runtime_error'
  what():  Total energies don't match!

<end of output>
Test time =   0.23 sec
----------------------------------------------------------
Test Failed.
"water_dimer_tpssh_xch" end time: Jan 03 01:01 EET
"water_dimer_tpssh_xch" time elapsed: 00:00:00
----------------------------------------------------------

138/176 Testing: water_dimer_tpssh_xrs
138/176 Test: water_dimer_tpssh_xrs
Command: "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/src/test/chkcompare" "water_dimer_tpssh_xrs.chk"
Directory: /var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999-serial/tests
"water_dimer_tpssh_xrs" start time: Jan 03 01:01 EET
Output:
----------------------------------------------------------
Reference directory is set to "/var/tmp/portage/sci-chemistry/erkale-9999/work/erkale-9999/refdata".
Total energy difference 2.022452e-08
Alpha electron count difference 3.197442e-14
Beta  electron count difference 3.197442e-14
Total electron count difference 3.197442e-14
Alpha density matrix difference 5.430593e-06
terminate called after throwing an instance of 'std::runtime_error'
  what():  Alpha density matrices differ!

<end of output>
Test time =   0.24 sec
----------------------------------------------------------
Test Failed.
"water_dimer_tpssh_xrs" end time: Jan 03 01:01 EET
"water_dimer_tpssh_xrs" time elapsed: 00:00:00
----------------------------------------------------------

@Reinis erkale git snapshot please.

Make sure you're running the current master by git pull, recompile and reinstall.

Which snapshot

GIT update -->
   repository:               https://github.com/susilehtola/erkale.git
   at the commit:            ad1d532a0b2268f5fba98d8949cbd4b799ab7b0d

OK, so it really is the up-to-date version.

However, all of the errors you saw are TPSS related, and are likely caused by the almost three year old version of libxc you are using.

Thanks for the pointer! I'll look into it.

It passes all tests with libxc-4.2.3. Thanks!