Can we use more than one processor when system='multithreaded'

Question

Can we use more than one processor when system='multithreaded'

dkzhangchao opened this issue 8 years ago · 33 comments

Hi, Ryan

I can run the package using system='multithreaded' on my computer, however,
we only use one processor for each task so that the runtime is rather long for some examples.
"NPROC=1 # processors per task"
Can we use the parallel ( mpiexec or mpirun) under the multithreaded system? that is,
can we set NPROC>1?

Thanks

Answer 1 · 2016-08-26T19:09:08.000Z

Hi Chao,

'Multithreaded' is intended for small-scale applications in which each solver instance runs on a single core. Currently it is not possible to use it with solver executables that require more than core.

That said, it might be possible to add this functionality.

Could you describe what system/cluster you are running on? What workflow are you carrying out (inversion, migration, ...)?

Ryan

Answer 2 · 2016-08-27T02:03:00.000Z

UPDATE: Actually, I believe there is a way to add this functionality by modifying only a single line. Since it's such a simple change, I'll go ahead and submit a pull request.

Answer 3 · 2016-08-28T02:14:30.000Z

UPDATE: I think it should be ready to go now.

Answer 4 · 2016-08-29T05:08:05.000Z

Hi Ryan,

I also think that we can use the NPROC>1, if that, we can use the mpiexec under 'Multithreaded' system.
However, i test your latest version today and find that there is a bug when running the 2D checkers example. Actually, i find this error occurs when using "combine module" to Sums individual kernel in base.py (solver).

maybe, it's the reason when using xcombine_sem, i'm not sure, can you check that?

Answer 5 · 2016-08-29T05:18:29.000Z

attached is the parameter.py
parameters.txt

Answer 6 · 2016-08-29T13:23:52.000Z

Hi Chao, From the traceback it looks like an issue with the utility for smoothing kernels. (To double check, you could try running with SMOOTH=False.)

As a workaround I would suggest commenting out completely the 'solver.specfem2d.smooth' method so that the parent class method 'solver.base.smooth' which uses the SPECFEM xcombine_sem utility will be invoked instead. Does that make sense?

Answer 7 · 2016-08-29T14:54:24.000Z

Hi Ryan,make sense. Actually, i have tried set SMOOTH=False, the following bug occurs:

Answer 8 · 2016-08-29T15:03:08.000Z

Hi Chao, Very useful, I'm not sure what's going on with this traceback actually, some type of regression? Let me take a look.

Answer 9 · 2016-08-29T15:07:54.000Z

Looks like our cluster here is having issues, I don't think I can debug it immediately, but I will as soon as it is back on.

In the meantime, could you remind me

the size of your model
the number and types of material parameters
what system/cluster you are running on including the number of cores available and the memory per node?

EDIT:
4) also how many cores per solver instance were you using for the last traceback?

Answer 10 · 2016-08-29T15:48:18.000Z

I just use the 2D checkers example:
attached is parameters.py
parameters.txt

checkboard model: http://tigress-web.princeton.edu/~rmodrak/2dAcoustic/checkers/
2）just use vs
just use my PC computer, not cluster

BTW, in the bug.log, there is a Warning: mesh_properties.nproc != PAR.NPROC, because now, mesh_properties.nproc=1,PAR.NPROC=4, does this cause the problem?

Answer 11 · 2016-08-29T15:54:49.000Z

Hi Chao, Did you remesh the model? To change the number of processors from 1 to 4, you would need to generate a new numerical mesh via xmeshfem2D.

Answer 12 · 2016-08-29T16:07:46.000Z

I also used the model provided in your examples. You means that if i want to change the number of processors from 1 to 4, i need to remesh the model using xmeshfem2d, that is mpiexec -n 4 ./xmeshfem2d , right?

Answer 13 · 2016-08-29T16:15:16.000Z

That's right, youd need to remesh and supply a new model. You can find information on this, I believe, on the SPECFEM2D manual or issues page. Good luck!

Answer 14 · 2016-08-29T16:21:35.000Z

Hi Chao, You need to create a new model in the form SPECFEM2D is able to
read and write. Probably it would be good to start by familiarizing
yourself with SPECFEM2D. The manual is a good place to start, and the
issues page can be useful if you run into any trouble.

On Mon, Aug 29, 2016 at 12:07 PM, CHAO ZHANG notifications@github.com
wrote:

I also used the model provided in your examples. You means that if i want
to change the number of processors from 1 to 4, i need to remesh the model
using xmeshfem2d, that is mpiexec -n 4 ./xmeshfem2d , right?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#38 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AERpSv1mql46wQVZuUi-hwO4aWUmaH77ks5qkwPTgaJpZM4JqmXP
.

Answer 15 · 2016-08-29T20:33:14.000Z

If it's alright I'll go ahead and close soon.

Answer 16 · 2016-08-30T19:48:49.000Z

Hi, Ryan
Sorry, i also meet with the bugs, even if i remeshed the model, like this

Can you try this 2D checkboard test on your computer, it's weired
Thanks

Answer 17 · 2016-08-31T01:48:28.000Z

Hi Chao, MPI parallelization is working fine the 3D case so I'm not sure what's wrong in the 2D case. Perhaps check the xcombine_sem for bugs (SPECFEM2D has never been a funded project so unfortunately there are bugs ). Also, check the xcombine_sem is being invoked with the proper mpiexec wrapper by overloading system.mpiexec if necessary.

Answer 18 · 2016-08-31T03:10:12.000Z

so, you also meet with the same problem in your computer, right? Buy the way, except system='multithreaded', if i want to use mpiexec, can i use the system='mpi'? For this case, can i set
nproc>1?

Answer 19 · 2016-09-01T03:03:16.000Z

I mean that all the examples in your guide for 2D, are using nproc=1, do you have some examples which using nproc>1. I want if it is set as nproc=1, how can you use the mpiexec
(mpiexec -n nproc ./xmeshfem2d)?

Answer 20 · 2016-09-01T21:29:10.000Z

Hi Chao, It might help to step back a bit first. You're running an inversion and you want each individual solver instance to run on multiple cores. In 3D this is currently working well for us. Such an approach is not currently implemented in 2D, but it should be fairly straightforward if you are familiar with SPECFEM2D and seisflows.

But let me ask, why do you want to do this for 2D? If your 2D model is so large that you can't fit as many copies of it in the memory available on a single node as you have processors available on that node, then it makes sense to have each solver instance run on multiple cores. If not, I can't think of any significant advantage in terms of speed or efficiency.

Answer 21 · 2016-09-03T16:51:02.000Z

Hi Ryan,

Actually, i just want to realize that: for each source, we can use parallel ( mpiexec -np nproc ./xspecfem).

I always use the system='MULTITHREADED', and it allows embarrassingly parallel tasks to be carried out several at a time. So, it means that for each task, we still use serial, rather that parallel. Actually, in the script: serial.py, there is a choice:

so i figure that you provide a choice for us using (mpiexec -n nproc ./xspecfem).

At the same time, i check out the mpi.py,

so i am very puzzled that how you can realize parallel for each task, either the system='MULTITHREADED' or system='MPI'. In your examples for 2D, NPROC is always set 1, in my mind, it will be serial, rather than parallel for each task, right?

Answer 22 · 2016-09-05T23:11:20.000Z

Good question, let me explain the naming convention.

The names of modules in seisflows/system reflects how parallelization over shots is implemented. For example, system/serial means that shots are carried out one at a time. system/multithreaded means that as many shots are run at a single time as allowed by the available number of processors.

There is no connection here to whether or not individual solver instances run in a parallel, only to how parallelization over shots is handled.

Answer 23 · 2016-09-06T05:46:49.000Z

Hi, Ryan
Thanks, it's helpful for my comprehension

so all of the module in seisflows/system just reflects how parallelization over shots is implemented, right?
if we consider each shot, we can use serial or parrell. That is, NPROC will determine it.
if NPROC=1, the ./xspecfem2d will be invoked; if NPROC>1, the mpirun -np NPROC ./xspecfem2d will be invoked. As you know, i failed when using the case of NPROC>1 under the sysetem='MULTITHREADED'.

so, do you try the NPROC>1 (parallel for each task) in one of the module in seisflows/system?
Because i just see the example (NPROC=1) for 2d case

Answer 24 · 2016-09-06T17:12:35.000Z

correct
correct
we have used NPROC>1 routinely for 3D but not for 2D

Answer 25 · 2016-09-11T14:34:25.000Z

Hi Ryan,

Make sense, thank for your help

Answer 26 · 2016-09-11T14:43:52.000Z

Hope it was helpful. Good luck!

On Sep 11, 2016, at 10:34 AM, CHAO ZHANG notifications@github.com wrote:

Hi Ryan,

Make sense, thank for your help

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub, or mute the thread.

Answer 27 · 2016-09-14T08:28:19.000Z

Hi Ryan,

I can use the nproc>1 under system='MULTITHREADED', actually, i find there are two places we need to
revise if we want to realize that.
1) xcombine_sem
in the examples, we use the specfem2d-d745c542 ( as you said last time, this version has some bug when using xcombine_sem)), that is, no matter you use the (./xcombine_sem) or (mpirun -n 4 ./xcombine_sem),
there is one processor kernel generated, proc000000_vs_kernel.bin, so even though we use mpirun -n 4 ./xcombine_sem, it only generates the proc000000_vs_kernel.bin ( no proc000001_vs_kernel.bin, proc000002_vs_kernel.bin,proc000003_vs_kernel.bin). So i download the latest version of specfem2d, when using mpirun -n 4 ./xcombine_sem, i used the new version instead of the old version, after that, i can get four profile of kernel, then it can work

2) smooth
in the seisflows, you use the function

so if we want to use the nproc>1, i revise it like this:

ater there two revision, the code can run parallel for each task

"Hi Chao, MPI parallelization is working fine the 3D case so I'm not sure what's wrong in the 2D case. Perhaps check the xcombine_sem for bugs (SPECFEM2D has never been a funded project so unfortunately there are bugs ). Also, check the xcombine_sem is being invoked with the proper mpiexec wrapper by overloading system.mpiexec if necessary."

Answer 28 · 2016-09-14T08:37:27.000Z

However, after the smooth, i find that there is some thing strange
After (mpirun -n nproc ./xcombine_sem), i get the kernel like this

After (smooth), i get the kernel like this

It seems that the smooth has obvious distort between the interface of each processor mesh, so does this
means the smooth method is incorrect? can you give some suggestions

Answer 29 · 2016-09-14T15:50:12.000Z

Hi Chao,

Thank you for identifying these issues, which arise when the 2D kernel summation and smoothing routines are used with MPI models. I'm unable to address these issues now myself because my PhD defense is within a few weeks. If you wanted to look into it yourself, it would be a matter of debugging and fixing SPECFEM2D's xcombine_sem and xsmooth_sem utilities.

If you want to, feel free to open a new issue either here or in the SPECFEM2D issues page, something along the lines of "SPECFEM2D's xcombine_sem and xsmooth_sem not working for MPI models".

Thanks,
Ryan

Answer 30 · 2016-09-15T18:57:42.000Z

Hi Ryan,
Thank for your suggestion, if the xcombine_sem and xsmooth_sem can be solved well, i think we can use the MPI. BTW, in your GJI paper, i see that you run some 2D synthetic data, so you use the serial instead of
parallel for each task, right? Because i think you maybe also meet this problem if using ( mpirun -np nproc
xcombine_sem and xsmooth_sem)
Wish you have a good time for PhD defense

Answer 31 · 2016-09-16T17:26:16.000Z

As I was trying to explain on the September 1 post, I'm a little confused about your ultimate goals. If you're running on a cluster, 2D inversions are quite fast even with one core per solver instance--unless your 2D model is huge, there is no need to parallelize over model regions. On the other hand, if you're running on a desktop or laptop, the number of cores is the limiting factor so you'll likely see no speedup by parallelizing over model region.

To answer your question anyway though, I ran those 2D experiments on a cluster, so I used the slurm_sm option for "small" SLURM inversions.

Answer 32 · 2016-09-17T03:11:28.000Z

Hi Ryan
ok, let me specify it, i am running on a desktop ( which has 16 processor), i used the system='multithreaded' in parameter.py, under this case because of the nproc=1, it will evoke serial solver(./xmeshfem2d and ./xspecfem2d), i find it's very slow in running especially when the frequency of source is too high.

so i think can we set the nproc>1, then it will evoke parallel solver (mpirun -n nproc ./xmeshfem2d and mpirun -n nproc ./xspecfem2d) for each task, so it will run faster when running. Does this make sense?

Answer 33 · 2016-09-17T13:37:52.000Z

If you're doing an inversion with only 16 cores, using more cores per solver instance doesn't get you much because it limits the number of shots you can run simultaneously. That's all I was trying to say.