fmihpc/dccrg

Allow defining neighborhoods with face neighbors only regardless of refinement level

Opened this issue · 11 comments

Vlasiator has a sparse vspace representation stored in each spatial cell, and hence the number of vspace blocks ie. the data that needs to be transferred to neighbors is different in each cell. The current code allocates memory on both sender and receiver side of each cell for the right amount of blocks in a cell before it is transferred. Both the sender and the receiver must know exactly which cells they are communicating with in order to allocate the right vspace size. If we use the current neighborhood definitions, we always have to update all neighboring cells that are of higher refinement, which is 1) hard to implement 2) wastes bandwidth 3) unnecessary, since we only need the nearest neighbor.

iljah commented

This is on my todo list but is awaiting restructuring of neighbor offset definitions used in neighbor lists and iterators. Hopefully I can implement this in 2019...

About point 1 above, I don't see the problem. In hybrid pic code I just query which cells are on process boundary (https://github.com/iljah/pamhd/blob/master/tests/particle/test.cpp#L735) and then allocate that much space in all copies of remote neighbors that will be receiving data (https://github.com/iljah/pamhd/blob/master/source/particle/solve_dccrg.hpp#L458 called with remote_cells in previous link) from somewhere based on prior information of how many particles are incoming. Doesn't vlasiator do exactly the same thing with blocks? Mesh refinement shouldn't factor into this in any way assuming you're willing to live with higher bandwidth requirement due to transferring some data that's not needed.

The problem is Vlasiator is related to the fact that we don't actually communicate blocks in a remote cell, but blocks that a remote cell has mapped to a local cell. Therefore, like I said earlier, both the sending and receiving rank have to know the number of blocks in the neighbor of the cell being communicated, not the cell itself. Because of the asymmetric neighborhoods, you can have messages that are sent to multiple cells, who may have different remote neighbors. Figuring out the right number of blocks for the messages that are sent unnecessarily is needlessly complicated and prone to errors, the current code is here https://github.com/tkoskela/vlasiator/blob/6c4ff7ec1d1ba070d209e91397cee6f980824aa0/vlasovsolver/cpu_trans_map_amr.cpp#L1386 but I recently discovered another corner case where it is still incorrect. There is also a risk of running out of memory if we have to communicate too many cells unnecessarily.

iljah commented

Don't know what kind of mapping you mean but that's probably not essential.

both the sending and receiving rank have to know the number of blocks in the neighbor of the cell being communicated

Does this mean that when sending data from one cell to another there's also third cell's data being sent? This is not how I understand your first message.

So is the problem that cells send different data to other processes depending which process is receiving and figuring out who should send and receive what is much more complicated with refined mesh?

I begin to feel that meeting to talk about this would be a good idea :)

I'll try my best to explain though. The problem is that the size of the message being sent depends on the face neighbor of the cell being sent. With a uniform mesh, communications in a size-1 neighborhood only happen between face neighbors so the face neighbor is always on the receiving process and this is no issue. With a refined mesh, communications also happen between non-face neighbors, and in this case the receiving process needs to locate the face neighbor of the remote cell it is receiving, in order to know the size of the message. This is additionally complicated by the fact that the get_neighbors_of() - function can't be called for remote cells, unless that has changed in the new interface?

iljah commented

size of the message being sent depends on the face neighbor of the cell being sent

so the (process owning the) cell receiving this message is not the (owner of the) face neighbor on which message size depends on, so receiver is a third cell?

uniform mesh, communications in a size-1 neighborhood only happen between face neighbors

In current dccrg language this would require 0-size neighborhood, I assume you've set up your own neighborhood that includes only face neighbors...

refined mesh ... receiving process needs to locate the face neighbor of the remote cell it is receiving, in order to know the size of the message

Ah so this is a three cell problem. In general it cannot be solved even without AMR without extra round of communication because, even with full 1 size neighborhood where same size cells with only one common vertex are neighbors, it is impossible to know all face neighbors of any neighbor because some face neighbors will be on other side so further than 1 cell away:

| C | N | F |

where cell C cannot know what its neighbor N has on other side which in this case is N's face neighbor F. But as far as C is concerned F might as well consist of 8 smaller cells in which case N would have 4 face neighbors on that side.

Since above general case would also apply without AMR I've either misunderstood something or your problem is more specific. Are you writing about face neighbors of N that can only be located in some direction from N relative to C?

get_neighbors_of() - function can't be called for remote cells

I think I've changed this already but note that neighbors_of of remote cells might not contain all cells, only those that current process knows about. For example in above case calling get_neighbors_of(N) from C might not return F because C might not know that F exists. This is part of my plan to remove the dccrg bottleneck that all processes know about all cells.

Yes, the direction of the communication is always known.

iljah commented

So how do you handle my example above and how does AMR change that?

iljah commented

As far as I can see, in any case N would have to tell C how much data is incoming before sending it because C cannot know that in general. And same other way around so N would also have to tell F how much data is incoming.

In the example above, let's say we are updating in the +1 direction, ie. C sends to N and N sends to F. I'm only updating the nearest neighbor so the message C sends depends on the size of N. But since N receives it, it knows its own size and can allocate its receive buffer correctly.

With AMR, the situation would be something like

 | C1 | C2 | NNNN | F1 | F2 |

Where NNNN is a coarse cell and C1, C2, F1, F2 are refined cells. Now, the message that C1 sends depends on the size of C2, but it is also received by NNNN, because both C1 and C2 are in its neighborhood. So NNNN has to find out 1) is C1 my face neighbor [no] 2) if not, who is C1:s face neighbor [C2] 3) what is the size of C2. So it becomes a three-cell problem.

iljah commented

message that C1 sends depends on the size of C2, but it is also received by NN

So only process owning C2 should receive data from C1 while process owning N should receive an empty message from C1, and if C2 and N are owned by same process then it should receive C1. I can probably add support for sending data only between face neighbors before end of this year. If you're in a more hurry then it should be straightforward to emulate that by using expanded version of get_mpi_datatype in your cell class: https://github.com/fmihpc/dccrg/blob/master/dccrg.hpp#L172

After every load balancing and mesh adaptation, record for each cell e.g. which processes own one or more of its face neighbors and in get_mpi_datatype() send 0 bytes to all other processes from that cell. Update this information between processes after every adaptation and load balance and when receiving the local copy can also check whether current process that's receiving should receive regular data (whose size it can calculate locally) or 0 bytes. You can also simplify get_mpi_datatype by making this logic specific to one neighborhood_id. get_face_neighbors() should be all that's needed to implement this on dccrg side.

So only process owning C2 should receive data from C1 while process owning N should receive an empty message from C1, and if C2 and N are owned by same process then it should receive C1.

Yes, this would be great! The sooner the better obviously :)