在 for 循环中处理复杂的 send recv 消息

Posted 2023-03-27

技术标签:

【中文标题】在 for 循环中处理复杂的 send recv 消息【英文标题】：Dealing with complex send recv message within a for loop 【发布时间】：2019-03-14 19:06:21 【问题描述】：

我正在尝试将 C++ 中的生物模型与 boost::mpi 并行化。这是我的第一次尝试，我对 boost 库完全陌生（我从 Schaling 的 Boost C++ Libraries 一书开始）。该模型由网格单元和居住在每个网格单元内的个人群组组成。这些类是嵌套的，因此Cohorts* 的向量属于GridCell。该模型运行了 1000 年，并且在每个时间步长上，存在分散，使得个体群体在网格单元之间随机移动。我想并行化 for 循环的内容，而不是循环本身，因为每个时间步长都取决于前一次的状态。

我使用world.send() 和world.recv() 将必要的信息从一个等级发送到另一个等级。因为有时我使用mpi::status 和world.iprobe() 来确保代码不会挂起等待从未发送的消息（我关注this tutorial）

我的代码的第一部分似乎工作正常，但在继续执行 for 循环的下一步之前，我无法确保已收到所有发送的消息。事实上，我注意到一些队伍在其他队伍有时间发送他们的消息之前就进入了下一个时间步（或者至少是输出中的样子）

我没有发布代码，因为它由几个类组成，而且很长。如果有兴趣，代码在github。我这里大致写了伪代码。我希望这足以理解问题。

int main()

    // initialise the GridCells and Cohorts living in them

    //depending on the number of cores requested split the 
    //grid cells that are processed by each core evenly, and 
    //store the relevant grid cells in a vector of  GridCell*

    // start to loop through each time step
    for (int k = 0; k < (burnIn+simTime); k++) 
    
        // calculate the survival and reproduction probabilities 
        // for each Cohort and the dispersal probability

        // the dispersing Cohorts are sorted based on the rank of
        // the destination and stored in multiple vector<Cohort*>

        // I send the vector<Cohort*> with 
        world.send(…)

        // the receiving rank gets the vector of Cohorts with: 
        mpi::status statuses[world.size()];
        for(int st = 0; st < world.size(); st++)
        
            ....
            if( world.iprobe(st, tagrec) )    
            statuses[st] = world.recv(st, tagrec, toreceive[st]);
            //world.iprobe ensures that the code doesn't hang when there
            // are no dispersers
        
        // do some extra calculations here

        //wait that all processes are received, and then the time step ends. 
        //This is the bit where I am stuck. 
        //I've seen examples with wait_all for the non-blocking isend/irecv,
        // but I don't think it is applicable in my case.
        //The problem is that I noticed that some ranks proceed to the next
        //time step before all the other ranks have sent their messages.

我用

编译

mpic++ -I/$HOME/boost_1_61_0/boost/mpi -std=c++11  -Llibdir \-lboost_mpi -lboost_serialization -lboost_locale  -o out

并使用mpirun -np 5 out 执行，但我希望稍后能够在 HPC 集群上使用更多内核执行（该模型将在全球范围内运行，单元的数量可能取决于用户选择的网格单元大小）。安装的编译器是g++ (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, Open MPI: 2.1.1

【问题讨论】：

【参考方案1】：

您没有要发送的内容是您的场景中的一条重要信息。您不能仅从没有消息中推断出该事实。没有消息仅表示没有发送任何内容尚未。

简单地发送一个大小为零的向量并跳过探测是最简单的方法。

否则你可能不得不彻底改变你的方法或实现一个非常复杂的推测执行/回滚机制。

另请注意，链接教程以非常不同的方式使用探针。

【讨论】：

另请注意，所有调用 MPI_Send() 的排名都没有发布任何 Recv，这在 MPI 标准方面是不正确的。如果消息足够短，这可能会起作用，但如果消息足够长，这可能会导致死锁。 @Zulan 非常感谢您的回复。我尝试发送空向量并且阻塞机制现在工作正常。我只是有点担心性能，因为我需要应用并行化的最终模型在计算上非常昂贵（即使用 OpenMP 在 24 个内核上运行大约需要 10 天）。如果我要采用您建议的推测执行/回滚机制，我可以从哪里开始？请您推荐一些可以学习的资源吗？从对实际执行的全面性能分析开始，以真正了解您的应用程序是如何工作的以及瓶颈在哪里。使用专为并行 MPI 应用程序设计的性能分析工具。

以上是关于在 for 循环中处理复杂的 send recv 消息的主要内容，如果未能解决你的问题，请参考以下文章