getian107/PRScs

Is progress printing deferred?

Closed this issue · 3 comments

Does all of the logging from PRScs print out at the end of the program?

Background: I am trying to identify the optimal machine type to run PRScs on in order to make it complete in a reasonable amount of time. I have noticed that none of the printouts (e.g., ##### process chromosome 1 ##### and the mcmc loop printouts) happen until the program seems to be complete.

It seems that the print statements aren't being emitted to stdout until the program has completed. My guess is that they're being buffered. In that context, I'm mostly wondering if flush=True should be part of the print statements to help with profiling:

print("Hello, World!", flush=True)

Hi James- Yes I think you are right that when a PRScs job is submitted the print statements are not emitted to stdout until the program has completed. If you call PRScs in the interactive mode those statements are printed to the screen immediately. Re the computational cost, the processing before the MCMC iterations should take no more than a few minutes to complete. If you analyze each chromosome in parallel, short chromosomes usually take ~15 min and longer chromosomes take ~75 min using a single CPU and a few Gb of memory.

Thanks! I was profiling 1 chromosome with 2-32 cores, settling on 2 cores since the wall time was about the same in all cases despite 60-80% apparent CPU usage in all cases. (This is just a limit of my profiling tools, not a problem with PRScs.) Thanks again.

I'm going to close this issue because I realized that PRScs still works for python 2.7, and my solution (flush=TRUE) is a python 3.x-only solution.