与不同大小数据类型的 MPI 通信

Posted 2023-02-21

技术标签:

【中文标题】与不同大小数据类型的 MPI 通信【英文标题】：MPI communication with different sized datatypes 【发布时间】：2016-09-28 08:05:25 【问题描述】：

假设一个程序在xp 次yp 次zp 进程上运行。使用笛卡尔通信器，可以认为进程被安排在维度 (xp,yp,zp) 的网格中。在这个程序中，根进程 (0) 声明并分配了一个 3D 数组 Atot，它将由每个进程（包括根）声明的 3D 数组 A 填充。

INTEGER, DIMENSION(3) :: Ntot
INTEGER, DIMENSION(3) :: N
INTEGER, DIMENSION(:,:,:), ALLOCATABLE :: Atot
INTEGER, DIMENSION(:,:,:), ALLOCATABLE :: A
:
! the 3 elements of the array N are determined by dividing the corresponding
! element of the array Ntot by the number of process in that direction
! taking into account the reminder of the division.
:
IF (myid == 0) THEN ! myid is the process' rank
  ALLOCATE(Atot(Ntot(1),Ntot(2),Ntot(3))
END IF
ALLOCATE(A(N(1),N(2),N(3))
A = myid

哪种方式是最正确、最简单、最有效的沟通方式？我在考虑MPI_gather：每个进程都会发送由N(1)*N(2)*N(3)MPI_INTEGERs 组成的整个数组A，然后根进程应该将它们接收到对应于多维数据集的单个MPI 派生数据类型中（MPI_type_vector 应该递归使用两次，对吗？）。 可以吗？

即使这样可行，当笛卡尔通信器的每个方向上的进程数均分Ntot 的相应元素时，对我来说听起来很容易，即当数组A 在每个方向上具有相同的维度时过程。 Ntot = (/9,9,9/) 时就是这种情况。

Ntot = (/10,10,10/)的案子呢？ mpi派生的数据类型在不同的进程中会有不同的维度，那么是否仍然可以使用MPI_ghather？

编辑

我不排除 MPI_GATHERV 可能是解决方案的一部分。但是，它允许每个进程发送（和根进程接收）不同数量的数据，即不同数量的 MPI_INTEGERS（在简单示例中）。然而，在我正在处理的情况下，根进程必须接收 3 维数组Atot 中的数据。为此，我认为定义 MPI 派生数据类型可能很有用，我们将其命名为 smallcube。在这种情况下，每个进程发送整个数组A，而主进程将从每个进程接收1 个smallcube 类型的数据。关键是small cube在三个维度上的长度不同，具体取决于它在笛卡尔网格中的位置（假设长度不均匀地除以三个维度上的进程数）。

【问题讨论】：

如果我错了就更正，但我认为 MPI_GATHERV（注意 V）允许每个进程的数据数量不同，也许这就是您在问题的最后一部分中寻找的内容？我编辑了问题:) 这可以通过使用MPI_ALLTOALLW 来模拟不存在的MPI_SCATTERW 来实现。 Jonathan Dursi here （也包括 MPI_ALLTOALLW 方法）为 C 提供了另一种规范答案。希望你能理解它的工作原理并将其翻译成 Fortran（这样做应该相对简单）。如果在那之前没有其他人会这样做，我可以在我有更多空闲时间时这样做。为什么要这样做？如果是将数据写入文件，MPI-IO 几乎肯定是一种更好的方法。我问，因为复制整个分布式对象几乎总是不是一个好主意，即使只是出于内存使用的原因无论您使用诸如 MPI_Scatterv 还是 MPI-IO 之类的集合，MPI_Type_create_subarray 是迄今为止最简单的方法 - 您会认为递归使用 MPI_Type_vector 会起作用，但由于类型范围的问题，它非常棘手。 archer.ac.uk/training/course-material/2015/12/ParallelIO_Oxford/… 的 IO 基准测试代码展示了如何使用 MPI-IO 从 Fortran 编写 3D 数组。 【参考方案1】：

正如 cmets 中提到的，如果您确实想将所有数据提取到一个处理器上，那么MPI_Type_create_subarray 可能是一个很好的方法。鉴于我刚刚在自己的项目中使用了MPI_Type_create_subarray，我想我会尝试提供一个可行的示例答案（请注意，我对错误检查和我要声明的类型感到松散）。

program subarrayTest
  use mpi
  implicit none
  integer, parameter :: n1 = 10, n2=20, n3=32
  INTEGER, DIMENSION(3) :: Ntot, N, sizes, subsizes, starts
  INTEGER, DIMENSION(:,:,:), ALLOCATABLE :: Atot, A
  integer :: iproc, nproc, sendSubType, ierr
  integer :: nl1, nl2, nl3 !Local block sizes
  integer :: l1, l2, l3, u1, u2, u3 !Local upper/lower bounds
  integer :: ip, sendRequest
  integer, dimension(:), allocatable :: recvSubTypes, recvRequests
  integer, dimension(:,:,:), allocatable :: boundsArr

  !MPI Setup
  call mpi_init(ierr)
  call mpi_comm_size(mpi_comm_world, nproc, ierr)
  call mpi_comm_rank(mpi_comm_world, iproc, ierr)

  !Set grid sizes
  Ntot = [n1,n2,n3]
  !For simplicity I'm assuming we only split the last dimension (and it has nproc as a factor)
  !although as long as you can specify l* and u* this should work (and hence nl* = 1+u*-l*)
  if(mod(n3,nproc).ne.0) then
     print*,"Error: n3 must have nproc as a factor."
     call mpi_abort(mpi_comm_world,MPI_ERR_UNKNOWN,ierr)
  endif
  nl1 = n1 ; l1 = 1 ; u1=l1+nl1-1
  nl2 = n2 ; l2 = 1 ; u2=l2+nl2-1
  nl3 = n3/nproc ; l3 = 1+iproc*nl3 ; u3=l3+nl3-1
  N = [nl1,nl2,nl3]

  !Very lazy way to ensure proc 0 knows the upper and lower bounds for all procs
  allocate(boundsArr(2,3,0:nproc-1)) 
  boundsArr=0
  boundsArr(:,1,iproc) = [l1, u1]
  boundsArr(:,2,iproc) = [l2, u2]
  boundsArr(:,3,iproc) = [l3, u3]
  call mpi_allreduce(MPI_IN_PLACE,boundsArr,size(boundsArr),MPI_INTEGER, &
       MPI_SUM, mpi_comm_world, ierr)

  !Allocate and populate local data portion
  IF (iproc == 0) THEN ! iproc is the process' rank
     ALLOCATE(Atot(Ntot(1),Ntot(2),Ntot(3)))
     Atot=-1 !So you can check all elements are set
  END IF
  ALLOCATE(A(N(1),N(2),N(3)))
  A = iproc

  !Now lets create the sub array types
  !First do the send type
  sizes=N !The size of the local array
  subsizes=1+[u1,u2,u3]-[l1,l2,l3] !The amount of data in each dimension to send -- here it's the full local data array but in general it could be a small subset

  starts = [0,0,0] !These are the lower bounds in each dimension where the sub array starts -- Note MPI assumes 0 indexing here.
  call mpi_type_create_subarray(size(sizes),sizes, subsizes, starts, &
       MPI_ORDER_FORTRAN, MPI_INTEGER, sendSubType, ierr)
  call mpi_type_commit(sendSubType, ierr)

  !Now on proc0 setup each receive type
  if (iproc == 0) then
     allocate(recvSubTypes(0:nproc-1)) !Use 0 indexing for ease
     sizes = Ntot !Size of dest array
     do ip=0,nproc-1
        subsizes=1+boundsArr(2,:,ip)-boundsArr(1,:,ip) !Size of A being sent from proc ip
        starts = boundsArr(1,:,ip) -1
        call mpi_type_create_subarray(size(sizes),sizes, subsizes, starts, &
             MPI_ORDER_FORTRAN, MPI_INTEGER, recvSubTypes(ip), ierr)
        call mpi_type_commit(recvSubTypes(ip), ierr)
     end do
  end if

  !Now lets use non-blocking communications to transfer data 
  !First post receives -- tag with source proc id
  if (iproc == 0) then
     allocate(recvRequests(0:nproc-1))
     do ip=0,nproc-1
        call mpi_irecv(Atot,1,recvSubTypes(ip),ip,ip,&
             mpi_comm_world,recvRequests(ip),ierr)
     end do
  end if

  !Now post sends
  call mpi_isend(A,1,sendSubType,0,iproc,mpi_comm_world,&
       sendRequest, ierr)

  !Now wait on receives/sends
  if(iproc == 0) call mpi_waitall(size(recvRequests),recvRequests,&
       MPI_STATUSES_IGNORE,ierr)
  call mpi_wait(sendRequest, MPI_STATUS_IGNORE, ierr)

  if(iproc == 0) print*,Atot
  call mpi_barrier(mpi_comm_world, ierr)

  !Now free resources -- not shown
  call mpi_finalize(ierr)
end program subarrayTest

您应该可以使用mpif90 编译它。您需要解决这个问题，以便为您的案例设置适当的本地边界，但希望这将提供一个有用的起点。这并不假定本地数组大小在处理器之间是相同的，只要正确设置了下限和上限（l* 和u*），那么这应该可以正常工作。请注意，我上面的代码可能在很多方面都没有遵循最佳实践。

【讨论】：

以上是关于与不同大小数据类型的 MPI 通信的主要内容，如果未能解决你的问题，请参考以下文章