Support data on device but no gpu-aware MPI for sf(all)gather(v)
Added SF GPU support
Update pack/unpack routines to do packing/unpacking for all neighbors in at most two routinesOne is used to pack data in self to self communication; The second is used for remote communication.So
Update pack/unpack routines to do packing/unpacking for all neighbors in at most two routinesOne is used to pack data in self to self communication; The second is used for remote communication.So that on GPU, we can use at most two kernels to do packing/packing for all neighbors instead of multiple kernels
show more ...
SF: Partially fix a bug when overlapped SF communications have same rootdata or leafdata on some ranksNow we use two keys (rootdata, leafdata) to identify a pending communication. But that is still
SF: Partially fix a bug when overlapped SF communications have same rootdata or leafdata on some ranksNow we use two keys (rootdata, leafdata) to identify a pending communication. But that is still notenough for cases where communications have same rootdata and leafdata on some ranks. Currently Weerror out on these cases. See src/vec/is/sf/examples/tutorials/ex2.c for various cases we can handleand we can not handle.
Add patterned SF graphs and use x as roots and y as leaves in x to y vecscatter
12