includeos/IncludeOS

Odd Program Failure

tweekley49 opened this issue · 12 comments

image

Can't figure out why I am getting a program failure. Could it be because I am running out of memory? Would that be a cause for the program failure to cause a kernel panic? I am benchmarking a rather large-ish issue....oddly enough on this system it says program failure but on another system it runs for a moment and then gives me a kernel panic. Picture below is the kernel panic. Any thoughts?? (Isn't the program, runs completely fine off the unikernel)

image

I don't know - thats a weird crash. You could always give it more RAM, just change the "mem" value in vm.json. Also, try updating your branch to my latest simpler_clone branch. No need to run conan for anything.

I will do that and let you know of the results. I think I am on the latest simpler_clone branch, but I will verify that.

image
I am STUMPED. I am able to run the program everywhere else without any problem but for some reason I get the error above when I run it inside IncludeOS. I made sure qemu had what it needed so I am perplexed.

It's probably a bug in IncludeOS then, it's calling TKILL on an invalid tid which is never good. I don't have any things for you to try, but can you show your code? Do you use cooperative threads? Are you using SMP for multiprocessing?

The code I am using is located at github.com/ProfessorWest/splash2-posix and I am using radix in the kernel folder.

Code isn't pretty, we just smashed everything into one file for simplicity. Also, no cooperative threads are being used and I updated src/musl/futex.cpp as you recommended to stabilize multi-processing.

I'll take a look at this when I have some time. Definitely interesting test. It looks like it uses all kinds of pthread stuff which might not be tested yet.

Awesome! Do let me know if your findings!

I fixed the problem. So inside of vm.json, I had to give more CPUs and had to make sure I gave enough memory too!

Good to hear! Yes, there is no SMP without -smp XXX to Qemu.

I will be attempting to bare metal boot it next with the benchmark to see the speed up from qemu -> bare metal

Interestingly enough...it appears the TKILL error is super random. Sometimes it does it, some times it doesn't. Sadly, specifying SMP and adequate memory was not the fix.

Yep, that's a known problem. I've been trying to fix it for a while but I'm having a break now. There is clearly some sharing in the thread code, so the only multiprocessing-safe interface is the SMP interface itself. Threads do work, but only if they can start properly. If you use a threadpool and create the threads on startup, then as long as it doesn't crash at the start, it will remain up forever.

So, the problem is during thread creation. I don't have any ideas atm.