SCR release testing for v3.1
Closed this issue · 7 comments
@hariharan-devarajan has volunteered to do some testing on corona.
He is also planning to create some documentation in: https://github.com/LLNL/scr/tree/develop/doc-dev/rst/developers
Here are my findings
For the serial test
The following tests passed:
serial_test_api_restart
serial_test_api_shared_file_restart
serial_test_config_restart
serial_test_api_multiple_restart
serial_test_ckpt_restart
71% tests passed, 2 tests failed out of 7
Total Test time (real) = 51.81 sec
The following tests FAILED:
12 - serial_test_ckpt_F_restart (Failed)
14 - serial_test_ckpt_F90_restart (Failed)
Errors while running CTest
For the failed tests
test 12
Start 12: serial_test_ckpt_F_restart
12: Test command: /usr/workspace/haridev/scr/build/examples/run_test.sh "srun" "-t 5 -N 1 -n 1" "./test_ckpt_F" "restart"
12: Test timeout computed to be: 1500
12: Run: srun -t 5 -N 1 -n 1 ./test_ckpt_F
12: At line 54 of file /usr/workspace/haridev/scr/examples/test_ckpt.F
12: Fortran runtime error: Cannot open file '/usr/WS2/haridev/scr/build/examples/timestep.8/rank_00000.ckpt': No such file or directory
12:
12: Error termination. Backtrace:
12: #0 0x15555183b171 in ???
12: #1 0x15555183bd19 in ???
12: #2 0x15555183c521 in ???
12: #3 0x155551a40288 in ???
12: #4 0x155551a4058c in ???
12: #5 0x4028f5 in test_ckpt_f
12: at /usr/workspace/haridev/scr/examples/test_ckpt.F:54
12: #6 0x40407d in main
12: at /usr/workspace/haridev/scr/examples/test_ckpt.F:162
12: flux-job: task(s) exited with exit code 2
12: mv: cannot stat '.scr': No such file or directory
6/7 Test #12: serial_test_ckpt_F_restart ............***Failed 5.16 sec
test 14
Start 14: serial_test_ckpt_F90_restart
14: Test command: /usr/workspace/haridev/scr/build/examples/run_test.sh "srun" "-t 5 -N 1 -n 1" "./test_ckpt_F90" "restart"
14: Test timeout computed to be: 1500
14: Run: srun -t 5 -N 1 -n 1 ./test_ckpt_F90
14: At line 55 of file /usr/workspace/haridev/scr/examples/test_ckpt.F90
14: Fortran runtime error: Cannot open file '/usr/WS2/haridev/scr/build/examples/timestep.8/rank_00000.ckpt': No such file or directory
14:
14: Error termination. Backtrace:
14: #0 0x15555183b171 in ???
14: #1 0x15555183bd19 in ???
14: #2 0x15555183c521 in ???
14: #3 0x155551a40288 in ???
14: #4 0x155551a4058c in ???
14: #5 0x4028f5 in test_ckpt_f90
14: at /usr/workspace/haridev/scr/examples/test_ckpt.F90:55
14: #6 0x40407d in main
14: at /usr/workspace/haridev/scr/examples/test_ckpt.F90:158
14: flux-job: task(s) exited with exit code 2
14: mv: cannot stat '.scr': No such file or directory
7/7 Test #14: serial_test_ckpt_F90_restart ..........***Failed 5.10 sec
For the parallel test
The following tests passed:
parallel_test_api_restart
parallel_test_api_shared_file_restart
parallel_test_config_restart
parallel_test_api_multiple_restart
parallel_test_ckpt_restart
parallel_test_ckpt_F90_restart
86% tests passed, 1 tests failed out of 7
Total Test time (real) = 59.50 sec
The following tests FAILED:
13 - parallel_test_ckpt_F_restart (Failed)
Errors while running CTest
Output of the failed test
13/15 Testing: parallel_test_ckpt_F_restart
13/15 Test: parallel_test_ckpt_F_restart
Command: "/usr/workspace/haridev/scr/build/examples/run_test.sh" "srun" "-t 5 -N 4 -n 4" "./test_ckpt_F" "restart"
Directory: /usr/workspace/haridev/scr/build/examples
"parallel_test_ckpt_F_restart" start time: Jun 05 12:04 PDT
Output:
----------------------------------------------------------
Run: srun -t 5 -N 4 -n 4 ./test_ckpt_F
At line 54 of file /usr/workspace/haridev/scr/examples/test_ckpt.F
Fortran runtime error: Cannot open file '/usr/WS2/haridev/scr/build/examples/timestep.8/rank_00000.ckpt': No such file or directory
Error termination. Backtrace:
At line 54 of file /usr/workspace/haridev/scr/examples/test_ckpt.F (unit = 1)
Fortran runtime error: Cannot open file '/usr/WS2/haridev/scr/build/examples/timestep.8/rank_00001.ckpt': No such file or directory
At line 54 of file /usr/workspace/haridev/scr/examples/test_ckpt.F (unit = 2)
At line 54 of file /usr/workspace/haridev/scr/examples/test_ckpt.F (unit = 3)
Error termination. Backtrace:
Fortran runtime error: Cannot open file '/usr/WS2/haridev/scr/build/examples/timestep.8/rank_00002.ckpt': No such file or directory
Fortran runtime error: Cannot open file '/usr/WS2/haridev/scr/build/examples/timestep.8/rank_00003.ckpt': No such file or directory
Error termination. Backtrace:
Error termination. Backtrace:
#0 0x15555183b171 in ???
#1 0x15555183bd19 in ???
#0 0x15555183b171 in ???
#1 0x15555183bd19 in ???
#2 0x15555183c521 in ???
#2 0x15555183c521 in ???
#3 0x155551a40288 in ???
#4 0x155551a4058c in ???
#3 0x155551a40288 in ???
#5 0x4028f5 in test_ckpt_f
#0 0x15555183b171 in ???
at /usr/workspace/haridev/scr/examples/test_ckpt.F:54
#4 0x155551a4058c in ???
#6 0x40407d in main
#1 0x15555183bd19 in ???
at /usr/workspace/haridev/scr/examples/test_ckpt.F:162
#5 0x4028f5 in test_ckpt_f
#2 0x15555183c521 in ???
#3 0x155551a40288 in ???
#4 0x155551a4058c in ???
#0 0x15555183b171 in ???
#1 0x15555183bd19 in ???
#2 0x15555183c521 in ???
at /usr/workspace/haridev/scr/examples/test_ckpt.F:54
#3 0x155551a40288 in ???
#6 0x40407d in main
#5 0x4028f5 in test_ckpt_f
#4 0x155551a4058c in ???
at /usr/workspace/haridev/scr/examples/test_ckpt.F:162
at /usr/workspace/haridev/scr/examples/test_ckpt.F:54
#5 0x4028f5 in test_ckpt_f
#6 0x40407d in main
at /usr/workspace/haridev/scr/examples/test_ckpt.F:54
at /usr/workspace/haridev/scr/examples/test_ckpt.F:162
#6 0x40407d in main
at /usr/workspace/haridev/scr/examples/test_ckpt.F:162
flux-job: task(s) exited with exit code 2
<end of output>
Test time = 5.70 sec
----------------------------------------------------------
Test Failed.
"parallel_test_ckpt_F_restart" end time: Jun 05 12:04 PDT
"parallel_test_ckpt_F_restart" time elapsed: 00:00:05
----------------------------------------------------------
@gonsie The Fortran tests are failing. I don't know the language enough to look into the issues. Do u want me to give it a try either way?
Oh, wait. It does work. Not as a suite, but if I clean up all directories and re-run just the Fortran tests, it works :) .
I can confirm all these tests run on Corona.
@gonsie @mcfadden8 Do u want me to test anything else? Both serial and parallel tests work on Corona.
@hariharan-devarajan thanks for the testing. now that #537 is closed, please run the suite one more time.
@gonsie I just tested it and it works.
thanks!