WassimTenachi/PhySO

Can I run PhySO in parallel?

dianmo42 opened this issue · 8 comments

Hi Wassim,

I just played with PhySO and saw huge potential in it.

However, running PhySO on single thread is a little bit slow (about 60-70s/epoch). How can I run it in parallel to speed up?

Thanks
Dianmo

Hi @dianmo42,

Thank you ! :)

This is weird, python and pytorch should natively make use of the multiple cores of your CPU.
I have added ballpark expected performances to the readme file of the repo, it sounds like you have below average performances.

Can you give the specs of your CPU and RAM ?
Are you using pytorch CPU or GPU (CPU is actually faster for physo) ?
Can hou check on your task manager (or via the htop command on Linux or Mac) that multiple cores are used during your run ?

Wassim

Hi Wassim,

It turns out that I was using GPU version of pytorch before. However, after I reinstall pytorch CPU, I don't see performance improvement.

This is my setup, with total RAM of 196G:

=====  Processor composition  =====
Processor name    : Intel(R) Xeon(R) Gold 6226R  
Packages(sockets) : 2
Cores             : 32
Processors(CPUs)  : 32
Cores per package : 16
Threads per core  : 1

I notice that in most time, PhySO runs on single thread. top gives usage of the code:

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                       
148559 zhangdm   20   0 2593852 330700  68076 R  99.0  0.2   1:11.13 python

But it does use multiple cores for about 2s in every epoch:

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                       
148559 zhangdm   20   0 3698968 865576  73548 S 825.7  0.4   4:19.30 python

Do you have any clue?

Thank you
Dianmo

It could be that you are still using the GPU for pytorch and that the 1-2 cores used are due to the numpy part of physo.
Just to make sure that you are using pytorch with the CPU, can you confirm that the output of the 2nd cell of the jupyter notebook returns :
cpu
If it is not the case, you can force the usage of the CPU by replacing the content of the 2nd cell 2 the notebook by:
DEVICE = 'cpu'

If once you have made sure that your are using the CPU, the problem persists, please run one of the toy demo codes of pytorch to check if other pytorch codes gives you similar problems.
Eg: https://github.com/pytorch/examples/blob/main/regression/main.py

Just checked and I'm quite sure it's using CPU pytorch.

I have cleaned all the packages and install only pytorch, use command:

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cpuonly -c pytorch

Then I tried the demo code https://github.com/pytorch/examples/blob/main/regression/main.py, it's still running on single thread.

Now it seems to be the problem of pytorch rather than physo. I have never used CPU pytorch and don't know how to fix this. Do you have any advice?

Thank you very much.

Hi,

I'm really sorry but since this issue is not related to physo I am not sure I will be able to resolve it.

Maybe you can try this to know how many threads are available to pytorch:
torch.get_num_threads()
And this to force the number you have on your machine:
torch.set_num_threads(8)

Further documentation: https://pytorch.org/docs/stable/torch.html#parallelism

Same issue. Only one CPU running at 100%

print(torch.get_num_threads())
12
print(torch.cuda.is_available())
False

Same issue. Only one CPU running at 100%

print(torch.get_num_threads())
12
print(torch.cuda.is_available())
False

@jabowery This works for me:

NTHREADS = torch.get_num_threads()
torch.set_num_threads(NTHREADS)

Thank you for your patience @WassimTenachi
I'm looking forward to making something interesting with physo.

Dianmo

Thank you @dianmo42, no problems ! ;)
Keep me updated if you end up using it for science, I would be happy to hear about it !