TeCSAR-UNCC/gem5-SALAM

Error when running bfs with baremetalarm.sh

Closed this issue · 10 comments

Hi guys,

I've tried gemm and it works! However, when I try to run bfs with command "./baremetalarm.sh -b bfs", an error happened after it runs a few cycles. Here attached the sreenshot of the error info. How can I fix this error? Thanks for any help you might provide!

Screen Shot 2021-01-24 at 9 49 52 PM

Could you run the following and let me know the results:

llvm-config --version

Currently we only support llvm 3.8.1. I can confirm two machines correctly ran the bfs benchmark on the current master branch. I was able to replicate the error message, but it was by using an incorrect version of llvm to generate the bfs.ll file. Depending on the version it could have generated gemm code that functioned correctly, but bfs code that does not. Also just a note we are currently updating the framework to support llvm 9.0 as well as creating more user friendly configurations mechanism, so continue to check on the progress of gem5-SALAM.

Thank you very much for your help!

I've run this command and it turns out that the version of llvm on my machine is also 3.8.1. Here attached the screen shot of the result. Besides, I'm also using the current master branch.

Screen Shot 2021-02-05 at 12 43 12 AM

When installing gem5-SALAM, I installed all the necessary dependencies for gem5 and LLVM in a docker container provided by official gem5 website using the following command (https://www.gem5.org/documentation/general_docs/building)

docker pull gcr.io/gem5-test/ubuntu-18.04_all-dependencies

Could this be the reason why I cannot generate the correct bfs.ll file?

Could you provide me with the bfs.ll file that you generated?

Sure, Here attached the bfs.ll that I generated.

bfs.zip

Could you try running your benchmark with the following and then upload the file generated @ $M5_PATH/BM_ARM_OUT/bfs/debug-trace.txt

./baremetalarm.sh -b bfs -p -f LLVMRuntime

-p signals print to file, which redirect all generated outputs and the terminal to a folder located @ $M5_PATH/BM_ARM_OUT/$benchmark/
(Note: Depending on the flags, the debug-trace.txt file can become quite large)
(Note 2: With how quickly yours was breaking I do not think your file will be over the 10MB limit for uploading)

-f can take debug flags, LLVMRuntime is a compound flag, if your curious what they are, check $M5_PATH/src/hwacc/SConscript, the debug flags are defined here. Turning them all on produces quite a bit of output, and dramatically extends runtime, but its useful information

Sure, here attached the debug-trace.txt
debug-trace.txt

I have found the issue and corrected it, pull down the changes and let me know if it works.

There was a bug in our code involving the name of the first basic block being stored incorrectly. I am curious what OS you are using, or whats different in your case that caused the error to fill in a garbage number, while all of the linux machines I've used filled in a 0, which was accidentally the correct number.

I've tried the latest version of master branch using the following command. This time I can get the report. However, there is still an error occurs at the end of the simulation. Here attached a screen shot
Screen Shot 2021-02-05 at 4 39 10 PM

Now I'm using a docker container of Ubuntu 18.04.5 LTS

I pushed an update to correct the error at the end of the bfs simulation. If you run across any other benchmarks that throw the same error after the statistics (address 0x7fffffff), its source is in $benchmark/host/main.cpp. At one point in time we forcefully ended the benchmarks by calling an out of bounds memory address, if this happens the final lines of main.cpp are likely:
*(char *)(0x7fffffff) = 0;

Which should be changed to:
m5_dump_stats(); m5_exit();

I'll check the others when I get a chance to ensure they all end gracefully. I'm closing this issue, please contact us again if you need any additional help.

It works now. Thank you so much for your help!