运行基于 OpenMPI 的库时出错

Posted

技术标签:

【中文标题】运行基于 OpenMPI 的库时出错【英文标题】:Error when running OpenMPI based library 【发布时间】:2015-01-10 04:08:09 【问题描述】:

我已经从 Ubuntu 中提供的标准 apt-get install 安装了 openmpi 库。我运行一个调用 MPI 库的 python 代码。我收到以下错误。任何想法是什么错误的根源?是 OpenMPI 配置错误吗?如何解决这个问题?

[thebigbang:17162] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[thebigbang:17162] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[thebigbang:17162] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[thebigbang:17162] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[thebigbang:17162] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[thebigbang:17162] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)

--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):



     opal_shmem_base_select failed
      --> Returned value -1 instead of OPAL_SUCCESS

--------------------------------------------------------------------------

[thebigbang:17162] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79

--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):



    ompi_mpi_init: orte_init failed
      --> Returned "Error" (-1) instead of "Success" (0)

--------------------------------------------------------------------------

*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[thebigbang:17162] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!

【问题讨论】:

你找到解决办法了吗? 【参考方案1】:

链接和模块加载方式似乎有点复杂。解决办法是用--disable-dlopen编译openmpi

--disable-mca-dso 编译也对我有用。不幸的是,我不知道这在 ubuntu 上有多容易。

见http://www.open-mpi.org/faq/?category=building#avoid-dso

我在这里找到了解决方案:

http://r.789695.n4.nabble.com/Problem-installing-Rmpi-with-Open-MPI-td4641762.html

【讨论】:

以上是关于运行基于 OpenMPI 的库时出错的主要内容,如果未能解决你的问题,请参考以下文章

我是用 OpenMPI 还是 MPICH 编译的?

如何使用 OpenMPI 编程运行 SocWatch?

OpenMPI:所有节点都作为节点 0 运行

OpenMPI:包 mpi 不存在

OpenMPI:简单的 2 节点设置

编译并运行 OpenMPI 程序