1*59599516SKenneth E. Jansen subroutine commu (global, ilwork, n, code) 2*59599516SKenneth E. Jansenc--------------------------------------------------------------------- 3*59599516SKenneth E. Jansenc 4*59599516SKenneth E. Jansenc This subroutine is responsible for interprocessor communication of 5*59599516SKenneth E. Jansenc the residual and solution vectors. 6*59599516SKenneth E. Jansenc 7*59599516SKenneth E. Jansenc input: 8*59599516SKenneth E. Jansenc global(nshg,n): global vector to be communicated. Note that 9*59599516SKenneth E. Jansenc this vector is local to the processor, (i.e. 10*59599516SKenneth E. Jansenc not distributed across processors) 11*59599516SKenneth E. Jansenc ilwork(nlwork): this is the local interprocessor work array. 12*59599516SKenneth E. Jansenc This array is local to the processor, (i.e. 13*59599516SKenneth E. Jansenc each processor has a unique ilwork array. 14*59599516SKenneth E. Jansenc n: second dimension of the array to be communicated 15*59599516SKenneth E. Jansenc code: = 'in' for communicating with the residual 16*59599516SKenneth E. Jansenc = 'out' for cummunicating the solution 17*59599516SKenneth E. Jansenc 18*59599516SKenneth E. Jansenc--------------------------------------------------------------------- 19*59599516SKenneth E. Jansenc 20*59599516SKenneth E. Jansenc The array ilwork describes the details of the communications. 21*59599516SKenneth E. Jansenc Each communication step (call of this routine) consists of a 22*59599516SKenneth E. Jansenc sequence of "tasks", where a task is defined as a communication 23*59599516SKenneth E. Jansenc between two processors where data is exchanged. This would imply 24*59599516SKenneth E. Jansenc that for a given processor, there will be as many tasks as there 25*59599516SKenneth E. Jansenc are processors with which it must communicate. Details of the 26*59599516SKenneth E. Jansenc ilwork array appear below. 27*59599516SKenneth E. Jansenc 28*59599516SKenneth E. Jansenc--------------------------------------------------------------------- 29*59599516SKenneth E. Jansenc 30*59599516SKenneth E. Jansen include "common.h" 31*59599516SKenneth E. Jansen include "mpif.h" 32*59599516SKenneth E. Jansen include "auxmpi.h" 33*59599516SKenneth E. Jansen integer status(MPI_STATUS_SIZE), ierr 34*59599516SKenneth E. Jansen integer stat(MPI_STATUS_SIZE, 2*maxtask), req(2*maxtask) 35*59599516SKenneth E. Jansen real*8 rDelISend, rDelIRecv, rDelWaitAll 36*59599516SKenneth E. Jansen 37*59599516SKenneth E. Jansen dimension global(nshg,n), 38*59599516SKenneth E. Jansen & rtemp(maxfront*n,maxtask), 39*59599516SKenneth E. Jansen & ilwork(nlwork) 40*59599516SKenneth E. Jansen 41*59599516SKenneth E. Jansen character*3 code 42*59599516SKenneth E. Jansen 43*59599516SKenneth E. Jansen if(impistat2.eq.1) call MPI_BARRIER (MPI_COMM_WORLD, ierr) 44*59599516SKenneth E. Jansen if(impistat.gt.0) rDelIRecv = zero 45*59599516SKenneth E. Jansen if(impistat.gt.0) rDelISend = zero 46*59599516SKenneth E. Jansen if(impistat.gt.0) rDelWaitAll = zero 47*59599516SKenneth E. Jansen 48*59599516SKenneth E. Jansen if (code .ne. 'in ' .and. code .ne. 'out') 49*59599516SKenneth E. Jansen & call error ('commu ','code ',0) 50*59599516SKenneth E. Jansen 51*59599516SKenneth E. Jansen if (n .eq. 1) then ! like a scalar 52*59599516SKenneth E. Jansen kdof = 1 53*59599516SKenneth E. Jansen elseif (n .eq. nsd) then ! like the normal vectors 54*59599516SKenneth E. Jansen kdof = 2 55*59599516SKenneth E. Jansen elseif (n .eq. ndof) then ! res, y, ac, krylov vectors.... 56*59599516SKenneth E. Jansen kdof = 3 57*59599516SKenneth E. Jansen elseif (n .eq. nflow*nflow) then ! bdiag 58*59599516SKenneth E. Jansen kdof = 4 59*59599516SKenneth E. Jansen elseif (n .eq. (nflow-1)*nsd) then ! qres 60*59599516SKenneth E. Jansen kdof = 5 61*59599516SKenneth E. Jansen elseif (n .eq. nflow) then 62*59599516SKenneth E. Jansen kdof = 6 63*59599516SKenneth E. Jansen elseif (n .eq. 24 ) then 64*59599516SKenneth E. Jansen kdof = 7 65*59599516SKenneth E. Jansen elseif (n .eq. 9) then 66*59599516SKenneth E. Jansen kdof = 8 67*59599516SKenneth E. Jansen elseif (n .eq. 11 ) then 68*59599516SKenneth E. Jansen kdof = 9 69*59599516SKenneth E. Jansen elseif (n .eq. 7 ) then 70*59599516SKenneth E. Jansen kdof = 10 71*59599516SKenneth E. Jansen elseif (n .eq. 33 ) then 72*59599516SKenneth E. Jansen kdof = 11 73*59599516SKenneth E. Jansen elseif (n .eq. 22 ) then 74*59599516SKenneth E. Jansen kdof = 12 75*59599516SKenneth E. Jansen elseif (n .eq. 16 ) then 76*59599516SKenneth E. Jansen kdof = 13 77*59599516SKenneth E. Jansen elseif (n .eq. 10 ) then 78*59599516SKenneth E. Jansen kdof = 14 79*59599516SKenneth E. Jansen elseif (n .eq. nflow*nsd ) then !surface tension + qres 80*59599516SKenneth E. Jansen kdof = 15 81*59599516SKenneth E. Jansen else 82*59599516SKenneth E. Jansen call error ('commu ','n ',n) 83*59599516SKenneth E. Jansen endif 84*59599516SKenneth E. Jansen 85*59599516SKenneth E. Jansenc... Note that when adding another kdof to the above set, we must 86*59599516SKenneth E. Jansenc... also make changes in ctypes.f and auxmpi.h 87*59599516SKenneth E. Jansen 88*59599516SKenneth E. Jansenc--------------------------------------------------------------------- 89*59599516SKenneth E. Jansenc ilwork(1): number of tasks 90*59599516SKenneth E. Jansenc 91*59599516SKenneth E. Jansenc The following information is contained in ilwork for each task: 92*59599516SKenneth E. Jansenc itag: tag of the communication 93*59599516SKenneth E. Jansenc iacc: == 0 if task is a send 94*59599516SKenneth E. Jansenc == 1 if task is a recieve 95*59599516SKenneth E. Jansenc iother: rank of processor with which this communication occurs 96*59599516SKenneth E. Jansenc numseg: number of data "segments" to be sent or recieved. A 97*59599516SKenneth E. Jansenc segment is defined as a continuous section of the global 98*59599516SKenneth E. Jansenc vector to be communicated, (i.e. a group of nodes (or, 99*59599516SKenneth E. Jansenc rather, "shape function coefficients") which occur 100*59599516SKenneth E. Jansenc sequentially in the array global(nshg,n)). 101*59599516SKenneth E. Jansenc isbeg: location of the first segment in the array owned by the 102*59599516SKenneth E. Jansenc current processor. 103*59599516SKenneth E. Jansenc 104*59599516SKenneth E. Jansenc The two types of communication are 'in', where the residual is being 105*59599516SKenneth E. Jansenc communicated, and 'out', where the solution is being communicated. 106*59599516SKenneth E. Jansenc Note that when the type is 'out', senders recieve and recievers send. 107*59599516SKenneth E. Jansenc 108*59599516SKenneth E. Jansenc The following comment pertains to a communication of type 'in': 109*59599516SKenneth E. Jansenc 110*59599516SKenneth E. Jansenc If the task is a send, then all of the numseg segments are 111*59599516SKenneth E. Jansenc sent with a single call to MPI_SEND. Where these segments live in 112*59599516SKenneth E. Jansenc the array is built into the array sevsegtype, which is a common 113*59599516SKenneth E. Jansenc array constructed in the subroutine "ctypes.f". In other words, 114*59599516SKenneth E. Jansenc sevsegtype is a data type that describes the indices of the blocks 115*59599516SKenneth E. Jansenc to be sent, in terms of there beginning index, and the length of 116*59599516SKenneth E. Jansenc each segment. Using this, we can make a single send to take care of 117*59599516SKenneth E. Jansenc all the segments for this task. 118*59599516SKenneth E. Jansenc 119*59599516SKenneth E. Jansenc If the task is a recieve, then once the vector is recieved, the 120*59599516SKenneth E. Jansenc recieved segments must be added to the correct locations in the 121*59599516SKenneth E. Jansenc current array. These locations are described in ilwork as the 122*59599516SKenneth E. Jansenc beginning position, then the length of the segment. 123*59599516SKenneth E. Jansenc 124*59599516SKenneth E. Jansenc--------------------------------------------------------------------- 125*59599516SKenneth E. Jansen numtask = ilwork(1) 126*59599516SKenneth E. Jansen 127*59599516SKenneth E. Jansen itkbeg = 1 128*59599516SKenneth E. Jansen m = 0 129*59599516SKenneth E. Jansen idl=0 130*59599516SKenneth E. Jansen 131*59599516SKenneth E. Jansen DO itask = 1, numtask 132*59599516SKenneth E. Jansen m = m + 1 133*59599516SKenneth E. Jansen itag = ilwork (itkbeg + 1) 134*59599516SKenneth E. Jansen iacc = ilwork (itkbeg + 2) 135*59599516SKenneth E. Jansen iother = ilwork (itkbeg + 3) 136*59599516SKenneth E. Jansen numseg = ilwork (itkbeg + 4) 137*59599516SKenneth E. Jansen isgbeg = ilwork (itkbeg + 5) 138*59599516SKenneth E. Jansenc 139*59599516SKenneth E. Jansenc.... if iacc == 0, then this task is a send. 140*59599516SKenneth E. Jansenc slave 141*59599516SKenneth E. Jansenc 142*59599516SKenneth E. Jansen if (iacc .EQ. 0) then 143*59599516SKenneth E. Jansenc 144*59599516SKenneth E. Jansenc.... residual communication 145*59599516SKenneth E. Jansenc 146*59599516SKenneth E. Jansen if (code .eq. 'in ') then 147*59599516SKenneth E. Jansen if(impistat.eq.1) then 148*59599516SKenneth E. Jansen iISend = iISend+1 149*59599516SKenneth E. Jansen elseif(impistat.eq.2) then 150*59599516SKenneth E. Jansen iISendScal = iISendScal+1 151*59599516SKenneth E. Jansen endif 152*59599516SKenneth E. Jansen if(impistat.gt.0) rmpitmr = TMRC() 153*59599516SKenneth E. Jansen call MPI_ISEND(global(isgbeg, 1), 1, sevsegtype(itask,kdof), 154*59599516SKenneth E. Jansen & iother, itag, MPI_COMM_WORLD, req(m), ierr) 155*59599516SKenneth E. Jansen if(impistat.gt.0) rDelISend = TMRC()-rmpitmr 156*59599516SKenneth E. Jansen if(impistat.eq.1) then 157*59599516SKenneth E. Jansen rISend = rISend+rDelISend 158*59599516SKenneth E. Jansen elseif(impistat.eq.2) then 159*59599516SKenneth E. Jansen rISendScal = rISendScal+rDelISend 160*59599516SKenneth E. Jansen endif 161*59599516SKenneth E. Jansen endif 162*59599516SKenneth E. Jansenc 163*59599516SKenneth E. Jansenc.... solution communication 164*59599516SKenneth E. Jansenc 165*59599516SKenneth E. Jansen if (code .eq. 'out') then 166*59599516SKenneth E. Jansen if(impistat.eq.1) then 167*59599516SKenneth E. Jansen iIRecv = iIRecv+1 168*59599516SKenneth E. Jansen elseif(impistat.eq.2) then 169*59599516SKenneth E. Jansen iIRecvScal = iIRecvScal+1 170*59599516SKenneth E. Jansen endif 171*59599516SKenneth E. Jansen if(impistat.gt.0) rmpitmr = TMRC() 172*59599516SKenneth E. Jansen call MPI_IRECV(global(isgbeg, 1), 1, sevsegtype(itask,kdof), 173*59599516SKenneth E. Jansen & iother, itag, MPI_COMM_WORLD, req(m), ierr) 174*59599516SKenneth E. Jansen if(impistat.gt.0) rDelIRecv = TMRC()-rmpitmr 175*59599516SKenneth E. Jansen if(impistat.eq.1) then 176*59599516SKenneth E. Jansen rIRecv = rIRecv+rDelIRecv 177*59599516SKenneth E. Jansen elseif(impistat.eq.2) then 178*59599516SKenneth E. Jansen rIRecvScal = rIRecvScal+rDelIRecv 179*59599516SKenneth E. Jansen endif 180*59599516SKenneth E. Jansen endif 181*59599516SKenneth E. Jansenc 182*59599516SKenneth E. Jansenc.... if iacc == 1, then this task is a recieve. 183*59599516SKenneth E. Jansenc master 184*59599516SKenneth E. Jansenc 185*59599516SKenneth E. Jansen else 186*59599516SKenneth E. Jansen if (code .eq. 'in ') then 187*59599516SKenneth E. Jansenc 188*59599516SKenneth E. Jansenc.... determine the number of total number of nodes involved in this 189*59599516SKenneth E. Jansenc communication (lfront), including all segments 190*59599516SKenneth E. Jansenc 191*59599516SKenneth E. Jansen lfront = 0 192*59599516SKenneth E. Jansen do is = 1,numseg 193*59599516SKenneth E. Jansen lenseg = ilwork (itkbeg + 4 + 2*is) 194*59599516SKenneth E. Jansen lfront = lfront + lenseg 195*59599516SKenneth E. Jansen enddo 196*59599516SKenneth E. Jansenc 197*59599516SKenneth E. Jansenc.... recieve all segments for this task in a single step 198*59599516SKenneth E. Jansenc 199*59599516SKenneth E. Jansen idl=idl+1 ! stands for i Do Later, the number to fix later 200*59599516SKenneth E. Jansen if(impistat.eq.1) then 201*59599516SKenneth E. Jansen iIRecv = iIRecv+1 202*59599516SKenneth E. Jansen elseif(impistat.eq.2) then 203*59599516SKenneth E. Jansen iIRecvScal = iIRecvScal+1 204*59599516SKenneth E. Jansen endif 205*59599516SKenneth E. Jansen if(impistat.gt.0) rmpitmr = TMRC() 206*59599516SKenneth E. Jansen call MPI_IRECV(rtemp(1,idl), lfront*n, MPI_DOUBLE_PRECISION, 207*59599516SKenneth E. Jansen & iother, itag, MPI_COMM_WORLD, req(m), ierr) 208*59599516SKenneth E. Jansen if(impistat.gt.0) rDelIRecv = TMRC()-rmpitmr 209*59599516SKenneth E. Jansen if(impistat.eq.1) then 210*59599516SKenneth E. Jansen rIRecv = rIRecv+rDelIRecv 211*59599516SKenneth E. Jansen elseif(impistat.eq.2) then 212*59599516SKenneth E. Jansen rIRecvScal = rIRecvScal+rDelIRecv 213*59599516SKenneth E. Jansen endif 214*59599516SKenneth E. Jansen endif 215*59599516SKenneth E. Jansen if (code .eq. 'out') then 216*59599516SKenneth E. Jansen if(impistat.eq.1) then 217*59599516SKenneth E. Jansen iISend = iISend+1 218*59599516SKenneth E. Jansen elseif(impistat.eq.2) then 219*59599516SKenneth E. Jansen iISendScal = iISendScal+1 220*59599516SKenneth E. Jansen endif 221*59599516SKenneth E. Jansen if(impistat.gt.0) rmpitmr = TMRC() 222*59599516SKenneth E. Jansen call MPI_ISEND(global(isgbeg, 1), 1, sevsegtype(itask,kdof), 223*59599516SKenneth E. Jansen & iother, itag, MPI_COMM_WORLD, req(m), ierr) 224*59599516SKenneth E. Jansen if(impistat.gt.0) rDelISend = TMRC()-rmpitmr 225*59599516SKenneth E. Jansen if(impistat.eq.1) then 226*59599516SKenneth E. Jansen rISend = rISend+rDelISend 227*59599516SKenneth E. Jansen elseif(impistat.eq.2) then 228*59599516SKenneth E. Jansen rISendScal = rISendScal+rDelISend 229*59599516SKenneth E. Jansen endif 230*59599516SKenneth E. Jansen endif 231*59599516SKenneth E. Jansen endif 232*59599516SKenneth E. Jansen 233*59599516SKenneth E. Jansen itkbeg = itkbeg + 4 + 2*numseg 234*59599516SKenneth E. Jansen 235*59599516SKenneth E. Jansen enddo !! end tasks loop 236*59599516SKenneth E. Jansen 237*59599516SKenneth E. Jansen if(impistat.eq.1) then 238*59599516SKenneth E. Jansen iWaitAll = iWaitAll+1 239*59599516SKenneth E. Jansen elseif(impistat.eq.2) then 240*59599516SKenneth E. Jansen iWaitAllScal = iWaitAllScal+1 241*59599516SKenneth E. Jansen endif 242*59599516SKenneth E. Jansen if(impistat.gt.0) rmpitmr = TMRC() 243*59599516SKenneth E. Jansen call MPI_WAITALL(m, req, stat, ierr) 244*59599516SKenneth E. Jansen if(impistat.gt.0) rDelWaitAll = TMRC()-rmpitmr 245*59599516SKenneth E. Jansen if(impistat.eq.1) then 246*59599516SKenneth E. Jansen rWaitAll = rWaitAll+rDelWaitAll 247*59599516SKenneth E. Jansen rCommu = rCommu+rDelIRecv+rDelISend+rDelWaitAll 248*59599516SKenneth E. Jansen elseif(impistat.eq.2) then 249*59599516SKenneth E. Jansen rWaitAllScal = rWaitAllScal+rDelWaitAll 250*59599516SKenneth E. Jansen rCommuScal = rCommuScal+rDelIRecv+rDelISend+rDelWaitAll 251*59599516SKenneth E. Jansen endif 252*59599516SKenneth E. Jansen 253*59599516SKenneth E. Jansenc 254*59599516SKenneth E. Jansenc Stuff added below is a delayed assembly of that which was communicated 255*59599516SKenneth E. Jansenc above but due to the switch to non-blocking receivves could not be 256*59599516SKenneth E. Jansenc assembled until after the waitall. Only necessary for commu "in" 257*59599516SKenneth E. Jansenc 258*59599516SKenneth E. Jansen 259*59599516SKenneth E. Jansen if(code .eq. 'in ') then 260*59599516SKenneth E. Jansen itkbeg=1 261*59599516SKenneth E. Jansen jdl=0 262*59599516SKenneth E. Jansen do j=1,numtask ! time to do all the segments that needed to be 263*59599516SKenneth E. Jansen ! assembled into the global vector 264*59599516SKenneth E. Jansen 265*59599516SKenneth E. Jansen iacc = ilwork (itkbeg + 2) 266*59599516SKenneth E. Jansen numseg = ilwork (itkbeg + 4) 267*59599516SKenneth E. Jansen isgbeg = ilwork (itkbeg + 5) 268*59599516SKenneth E. Jansen if(iacc.eq.1) then 269*59599516SKenneth E. Jansen jdl=jdl+1 ! keep track of order of rtemp's 270*59599516SKenneth E. Jansenc 271*59599516SKenneth E. Jansenc... add the recieved data to the global array on the current processor. 272*59599516SKenneth E. Jansenc Note that this involves splitting up the chunk of recieved data 273*59599516SKenneth E. Jansenc into its correct segment locations for the current processor. 274*59599516SKenneth E. Jansenc 275*59599516SKenneth E. Jansen itemp = 1 276*59599516SKenneth E. Jansen do idof = 1,n 277*59599516SKenneth E. Jansen do is = 1,numseg 278*59599516SKenneth E. Jansen isgbeg = ilwork (itkbeg + 3 + 2*is) 279*59599516SKenneth E. Jansen lenseg = ilwork (itkbeg + 4 + 2*is) 280*59599516SKenneth E. Jansen isgend = isgbeg + lenseg - 1 281*59599516SKenneth E. Jansen global(isgbeg:isgend,idof) = global(isgbeg:isgend,idof) 282*59599516SKenneth E. Jansen & + rtemp (itemp:itemp+lenseg-1,jdl) 283*59599516SKenneth E. Jansen itemp = itemp + lenseg 284*59599516SKenneth E. Jansen enddo 285*59599516SKenneth E. Jansen enddo 286*59599516SKenneth E. Jansen endif ! end of receive (iacc=1) 287*59599516SKenneth E. Jansen itkbeg = itkbeg + 4 + 2*numseg 288*59599516SKenneth E. Jansen enddo 289*59599516SKenneth E. Jansen endif ! commu "in" 290*59599516SKenneth E. Jansen return 291*59599516SKenneth E. Jansen end 292*59599516SKenneth E. Jansen 293*59599516SKenneth E. Jansen 294*59599516SKenneth E. Jansen 295