打开 MPI Waitall() 分段错误

Posted 2023-02-14

技术标签:

【中文标题】打开 MPI Waitall() 分段错误【英文标题】：Open MPI Waitall() Segmentation Fault 【发布时间】：2018-04-26 04:16:54 【问题描述】：

我是 MPI 的新手，我正在尝试开发一个非阻塞程序（使用 Isend 和 Irecv）。功能非常基础（很有教育意义）：

有一个进程（等级 0）是主进程并从从属进程（等级 1-P）接收消息。 master 只接收结果。 slave 生成一个介于 0 和 R 之间的 N 个随机数数组，然后它们对这些数字进行一些运算（同样，这只是为了教育目的，这些运算没有任何意义）整个过程（操作 + 发送数据）执行 M 次（这只是为了比较不同的实现；阻塞和非阻塞）

当我调用 MPI_waitall() 函数时，我在主进程中遇到分段错误

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#include <math.h>
#include <time.h>
#define M 1000      //Number of times
#define N 2000      //Quantity of random numbers
#define R 1000      //Max value of random numbers

double SumaDeRaices (double*);

int main(int argc, char* argv[]) 
    int         yo;            /* rank of process      */
    int         p;             /* number of processes  */
    int         dest;          /* rank of receiver     */

    /* Start up MPI */
    MPI_Init(&argc, &argv);

    /* Find out process rank  */
    MPI_Comm_rank(MPI_COMM_WORLD, &yo);

    /* Find out number of processes */
    MPI_Comm_size(MPI_COMM_WORLD, &p);

    MPI_Request  reqs[p-1];
    MPI_Status   stats[p-1];   

    if (yo == 0) 
        int i,j;
        double result;
        clock_t inicio,fin;

        inicio = clock();

        for(i = 0; i<M; i++) //M times
            for(j = 1; j<p; j++) //for every slave
                MPI_Irecv(&result, sizeof(double), MPI_DOUBLE, j, i, MPI_COMM_WORLD, &reqs[j-1]);
            
            MPI_Waitall(p-1,reqs,stats); //wait all slaves (SEG_FAULT)
        
        fin = clock()-inicio;

        printf("Tiempo total de ejecucion %f segundos \n", ((double)fin)/CLOCKS_PER_SEC);   
    
    else 
        double* numAleatorios = (double*) malloc( sizeof(double) * ((double) N) ); //array with numbers
        int i,j;
        double resultado; 
        dest=0;

        for(i=0; i<M; i++) //again, M times
            for(j=0; j<N; j++)
                numAleatorios[j] = rand() % R ;
            
            resultado = SumaDeRaices(numAleatorios);
            MPI_Isend(&resultado,sizeof(double), MPI_DOUBLE, dest, i, MPI_COMM_WORLD,&reqs[p-1]); //send result to master
        
    

    /* Shut down MPI */
    MPI_Finalize();

    exit(0);
 /* main */



double SumaDeRaices (double* valores)
    int i;
    double sumaTotal = 0.0;

    //Raices cuadradas de los valores y suma de estos   
    for(i=0; i<N; i++)
        sumaTotal = sqrt(valores[i]) + sumaTotal;
    

    return sumaTotal;

【问题讨论】：

您收到 M*p 次，但您只发送 M 次。正确的数字是多少？ @mcsim 我为每个进程发送 M 次（所以我发送 M*p 次）啊。你是对的。 【参考方案1】：

您的代码存在几个问题。首先，在您的 Isend 中，您多次传递 &resultado 而无需等到之前的非阻塞操作完成。在确保操作完成之前，不允许重用传递给 Isend 的缓冲区。

我建议您使用普通发送，因为与同步发送 (SSend) 相比，普通阻塞发送会在您可以重用缓冲区时立即返回。

其次，不需要使用消息标签。我建议您将 tag 设置为 0。就性能而言，它只是更快。

第三，结果不应该是一个简单的变量，而是一个大小至少为(p-1)的数组

第四，如果大小不是已知的小数，我不建议您在堆栈上分配数组，例如 MPI_Request 和 MPI_Status。在这种情况下，数组的大小是(p-1)，所以你最好使用malloc来做这个数据结构。

第五，如果不检查状态，使用MPI_STATUSES_IGNORE。

您还应该指定项目数 (1)，而不是 sizeof(double)。

当然，最好的版本就是使用 MPI_Gather。

此外，通常没有理由不在根节点上运行计算。

这里是稍微改写的例子：

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#include <math.h>
#include <time.h>
#define M 1000      //Number of times
#define N 2000      //Quantity of random numbers
#define R 1000      //Max value of random numbers

double SumaDeRaices (double* valores)


  int i;
  double sumaTotal = 0.0;

  //Raices cuadradas de los valores y suma de estos
  for(i=0; i<N; i++) 
    sumaTotal = sqrt(valores[i]) + sumaTotal;
  

  return sumaTotal;



int main(int argc, char* argv[]) 
  int         yo;            /* rank of process      */
  int         p;             /* number of processes  */

  /* Start up MPI */
  MPI_Init(&argc, &argv);

  /* Find out process rank  */
  MPI_Comm_rank(MPI_COMM_WORLD, &yo);

  /* Find out number of processes */
  MPI_Comm_size(MPI_COMM_WORLD, &p);

  double *result;
  clock_t inicio, fin;
  double *numAleatorios;
  if (yo == 0) 
    inicio = clock();
  

  numAleatorios = (double*) malloc( sizeof(double) * ((double) N) ); //array with numbers
  result = (double *) malloc(sizeof(double) * p);

  for(int i = 0; i<M; i++) //M times
    for(int j=0; j<N; j++) 
      numAleatorios[j] = rand() % R ;
    
    double local_result = SumaDeRaices(numAleatorios);
    MPI_Gather(&local_result, 1, MPI_DOUBLE, result, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); //send result to master
  

  if (yo == 0) 
    fin = clock()-inicio;

    printf("Tiempo total de ejecucion %f segundos \n", ((double)fin)/CLOCKS_PER_SEC);
  

  free(numAleatorios);

  /* Shut down MPI */
  MPI_Finalize();
 /* main */

【讨论】：

你能找到你对标签的鄙视吗？无论如何，邮件信封仍然必须包含它。检查速度慢吗？ @VladimirF 我试图找到一个参考。我记得的一个是“减轻 MPI 消息匹配痛苦”，不幸的是它在付费墙后面。如果您想了解有关本文的更多详细信息，请告诉我。在那里，他们试图解决一个更普遍的无序消息匹配问题，而 AFAIR mpi 标签只会阻碍这个过程，尤其是使用 MPI_TAG_ANY 和 MPI_SOURCE_ANY。付费墙后面的科学文本并不是不参考它的理由。我通过网络搜索找到了它，我确实可以访问它。我去看看。我在论文中找不到这样的结论。如果是这样，那么它可能只适用于他们新颖的散列算法。

以上是关于打开 MPI Waitall() 分段错误的主要内容，如果未能解决你的问题，请参考以下文章