OpenMPI源码剖析3:

Posted Hello woooo

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了OpenMPI源码剖析3:相关的知识,希望对你有一定的参考价值。

接着上一篇的疑问,我们说道,会执行 try_kill_peers 函数,它的函数定义在 ompi_mpi_abort.c 下:

// 这里注释也说到了,主要是杀死在同一个communicator的进程(不包括自己)
/*
 * Local helper function to build an array of all the procs in a
 * communicator, excluding this process.
 *
 * Killing a just the indicated peers must be implemented for
 * MPI_Abort() to work according to the standard language for
 * a ‘high-quality‘ implementation.
 *
 * It would be nifty if we could differentiate between the
 * abort scenarios (but we don‘t, currently):
 *      - MPI_Abort()
 *      - MPI_ERRORS_ARE_FATAL
 *      - Victim of MPI_Abort()
 */
// 调用时传入了对应通信子
static void try_kill_peers(ompi_communicator_t *comm,
                           int errcode)
{
  // 1. 第一部分: 给 ompi_process_name_t 指针申请空间,得到进程个数
    int nprocs;
    ompi_process_name_t *procs;

    nprocs = ompi_comm_size(comm);
    /* ompi_comm_remote_size() returns 0 if not an intercomm, so
       this is safe */
    nprocs += ompi_comm_remote_size(comm);

    procs = (ompi_process_name_t*) calloc(nprocs, sizeof(ompi_process_name_t));
    if (NULL == procs) {
        /* quick clean orte and get out */
        ompi_rte_abort(errno, "Abort: unable to alloc memory to kill procs");
    }

	// 2. 第二部分:	将进程放入数组中
    /* put all the local group procs in the abort list */
    int rank, i, count;
    rank = ompi_comm_rank(comm);	//这里可以获取到自己在该 communicator 中的 rank————疑问1
    for (count = i = 0; i < ompi_comm_size(comm); ++i) {
        if (rank == i) {
            /* Don‘t include this process in the array */
            --nprocs;
        } else {
            assert(count <= nprocs);
            procs[count++] =
                *OMPI_CAST_RTE_NAME(&ompi_group_get_proc_ptr(comm->c_remote_group, i, true)->super.proc_name);
        }
    }

	// 3. 第三部分: 远程的 group 进程也放入数组中
    /* if requested, kill off remote group procs too */
    for (i = 0; i < ompi_comm_remote_size(comm); ++i) {
        assert(count <= nprocs);
        procs[count++] =
            *OMPI_CAST_RTE_NAME(&ompi_group_get_proc_ptr(comm->c_remote_group, i, true)->super.proc_name);
    }

	// 4. 第四部分: 杀死进程
    if (nprocs > 0) {
        ompi_rte_abort_peers(procs, nprocs, errcode);
    }

    /* We could fall through here if ompi_rte_abort_peers() fails, or
       if (nprocs == 0).  Either way, tidy up and let the caller
       handle it. */
    free(procs);
}

 这个时候,就得去看看 ompi_rte_abort_peers(procs, nprocs, errcode) 函数的定义,

 

 

以上是关于OpenMPI源码剖析3:的主要内容,如果未能解决你的问题,请参考以下文章

libevent网络编程汇总

06 drf源码剖析之权限

libevent源码剖析

05 drf源码剖析之认证

《Docker 源码分析》全球首发啦!

初识Spring源码 -- doResolveDependency | findAutowireCandidates | @Order@Priority调用排序 | @Autowired注入(代码片段