openMP 的多线程崩溃
Posted
技术标签:
【中文标题】openMP 的多线程崩溃【英文标题】:multithreading crash with openMP 【发布时间】:2018-04-03 12:32:52 【问题描述】:我的功能是对 17000 个点的点云(大约可以波动)进行排序,以提取其中的相关点并将它们存储在一个向量中。一切正常,但速度很慢。所以我正在尝试使用 openMp 来并行化任务,但我遇到了崩溃。
这是不工作的多线程版本:
void IntervalMapEstimator::extract_relevant_points_multithread(std::vector<Point3D>& relevant_points, std::vector<Point3D>& pointcloud, doubleIE cell_min_angle_sensor_rot, doubleIE cell_max_angle_sensor_rot)
#pragma omp parallel for shared( pointcloud, cell_min_angle_sensor_rot, cell_max_angle_sensor_rot)
for(int i = 0; i < pointcloud.size(); i++)
//int numThread = omp_get_thread_num();
//std::cout << "numThread = " << numThread << std::endl;
// Check whether the cell is between the 2nd and 3rd quadrant (--> e.g. -170 to 170°)
if ( cell_min_angle_sensor_rot < 0 && cell_max_angle_sensor_rot >= 0 && abs(cell_min_angle_sensor_rot) > M_PI/2 && abs(cell_max_angle_sensor_rot) > M_PI/2)
// Point must be smaller than the minimum angle and bigger than the max angle (e.g. min-angle: -1.5 max-angle: 1.5 point angle bigger than 1.5 or smaller than -1.5)
if ( pointcloud[i].pol_sensor_rot.phi <= cell_min_angle_sensor_rot || pointcloud[i].pol_sensor_rot.phi >= cell_max_angle_sensor_rot )
relevant_points.push_back(pointcloud[i]);
else
if (pointcloud[i].pol_sensor_rot.phi >= cell_min_angle_sensor_rot && pointcloud[i].pol_sensor_rot.phi <= cell_max_angle_sensor_rot )
relevant_points.push_back(pointcloud[i]);
这里是我在输出中得到的响应:
7fcc93737000-7fcc9374b000 r-xp 00012000 103:05 7078283 /lib/x86_64-linux-gnu/ld-2.23.so
7fcc938a3000-7fcc938f7000 rw-p 00000000 00:00 0
7fcc9391e000-7fcc9392c000 rw-p 00000000 00:00 0
7fcc93937000-7fcc93939000 rw-p 00000000 00:00 0
7fcc93947000-7fcc9394a000 rw-p 00000000 00:00 0
7fcc9394a000-7fcc9394b000 r--p 00025000 103:05 7078283 /lib/x86_64-linux-gnu/ld-2.23.so
7fcc9394b000-7fcc9394c000 rw-p 00026000 103:05 7078283 /lib/x86_64-linux-gnu/ld-2.23.so
7fcc9394c000-7fcc9394d000 rw-p 00000000 00:00 0
7fff20b58000-7fff20b7a000 rw-p 00000000 00:00 0 [stack]
7fff20bb8000-7fff20bbb000 r--p 00000000 00:00 0 [vvar]
7fff20bbb000-7fff20bbd000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
[interval_map-1] process has died [pid 12700, exit code -6, cmd /home/catkin_ws/SWC_INTERVAL_MAP/devel/lib/interval_map_test/interval_map_test __name:=interval_map __log:=/home/.ros/log/615acdf0-3714-11e8-bc07-9cebe84f847e/interval_map-1.log].
log file: /home/.ros/log/615acdf0-3714-11e8-bc07-9cebe84f847e/interval_map-1*.log
all processes on machine have died, roslaunch will exit
shutting down processing monitor...
... shutting down processing monitor complete
done
目前,我最好的猜测是,有一个等待列表用于查看点云向量内的值或相关点向量上的 push_back,它们越来越大,最终导致堆栈爆炸。 有人对这个问题有任何想法吗?
【问题讨论】:
每个线程都有自己的相关点副本吗?如果不是这样,如果多个线程推回到同一个向量中可能会出现问题。 我刚刚编辑了我的问题,抱歉我放了单线程版本。正如您现在所看到的,related_points 向量在线程之间没有共享,所以我猜 openMp 确实像某种等待列表来执行推回,我仍在搜索它。另一方面,点云向量在所有线程之间共享,但由于我只是读取其中的值,所以我猜它没问题。 变量默认共享,需要显式设为relevant_points
private
omp 并没有像我所知的那样进行任何竞争条件检查,并且相关点是在循环之外声明的,因此默认情况下它是共享的。所以你在多个线程中推回同一个向量,这可能是个问题。
【参考方案1】:
所以,问题是 openMp 的私有子句不适用于参考向量。 这不起作用(调用未定义的引用类型错误):
#pragma omp parallel for private (relevant_points)
我的解决方法是使用 boost 中的 lock_guard。
void IntervalMapEstimator::extract_relevant_points_multithread(std::vector<Point3D>& relevant_points ,std::vector<Point3D>& pointcloud, doubleIE cell_min_angle_sensor_rot, doubleIE cell_max_angle_sensor_rot)
boost::mutex mutex_relevant_points;
omp_set_num_threads (5);
#pragma omp parallel for shared (relevant_points, pointcloud, cell_min_angle_sensor_rot, cell_max_angle_sensor_rot)
for(int i = 0; i < pointcloud.size(); i++)
//int numThread = omp_get_thread_num();
//std::cout << "numThread = " << numThread << std::endl;
// Check whether the cell is between the 2nd and 3rd quadrant (--> e.g. -170 to 170°)
if ( cell_min_angle_sensor_rot < 0 && cell_max_angle_sensor_rot >= 0 && abs(cell_min_angle_sensor_rot) > M_PI/2 && abs(cell_max_angle_sensor_rot) > M_PI/2)
// Point must be smaller than the minimum angle and bigger than the max angle (e.g. min-angle: -1.5 max-angle: 1.5 point angle bigger than 1.5 or smaller than -1.5)
if ( pointcloud[i].pol_sensor_rot.phi <= cell_min_angle_sensor_rot || pointcloud[i].pol_sensor_rot.phi >= cell_max_angle_sensor_rot )
boost::lock_guard<boost::mutex> lock(mutex_relevant_points);
relevant_points.push_back(pointcloud[i]);
else
if (pointcloud[i].pol_sensor_rot.phi >= cell_min_angle_sensor_rot && pointcloud[i].pol_sensor_rot.phi <= cell_max_angle_sensor_rot )
boost::lock_guard<boost::mutex> lock(mutex_relevant_points);
relevant_points.push_back(pointcloud[i]);
您还可以看到我设置了线程数,否则处理点的程序实际上占用了所有线程,而发送所有点的程序没有任何线程可以使用。 我希望它会对某人有所帮助。
【讨论】:
【参考方案2】:实际上,您在这里尝试实现的目标与使用向量作为归约变量的归约非常相似。
如果我想并行化它,我将如何解决它:
void IntervalMapEstimator::extract_relevant_points_multithread( std::vector<Point3D>& relevant_points,
std::vector<Point3D>& pointcloud,
doubleIE cell_min_angle_sensor_rot,
doubleIE cell_max_angle_sensor_rot )
#pragma omp parallel num_threads( 5 )
std::vector<Point3D> local_relevant_points;
#pragma omp for
for ( int i = 0; i < pointcloud.size(); i++ )
// Check whether the cell is between the 2nd and 3rd quadrant (--> e.g. -170 to 170°)
if ( cell_min_angle_sensor_rot < 0 && cell_max_angle_sensor_rot >= 0
&& abs( cell_min_angle_sensor_rot ) > M_PI/2 && abs( cell_max_angle_sensor_rot ) > M_PI/2 )
// Point must be smaller than the minimum angle and bigger than the max angle
// (e.g. min-angle: -1.5 max-angle: 1.5 point angle bigger than 1.5 or smaller than -1.5)
if ( pointcloud[i].pol_sensor_rot.phi <= cell_min_angle_sensor_rot
|| pointcloud[i].pol_sensor_rot.phi >= cell_max_angle_sensor_rot )
local_relevant_points.push_back( pointcloud[i] );
else
if ( pointcloud[i].pol_sensor_rot.phi >= cell_min_angle_sensor_rot
&& pointcloud[i].pol_sensor_rot.phi <= cell_max_angle_sensor_rot )
local_relevant_points.push_back( pointcloud[i] );
auto Size = relevant_points.size();
#pragma omp critical
Size += local_relevant_points.size();
#pragma omp barrier
#pragma single
relevant_points.reserve( Size );
#pragma omp critical
for ( int i = 0; i < local_relevant_points.size(); i++ )
relevant_points.push_back( local_relevant_points[i] );
想法是并行进行局部处理,然后将局部结果累加到全局输出中。 我通过计算结果的预期大小添加了一个额外的改进,以便提前保留正确的大小以节省额外冗长的内存分配。
现在,有两个警告:
-
我还没有编译它,所以那里可能有一些错字;
我不确定并行化这会带来多大好处,因为整个过程可能太短且内存受限......但无论如何你都去了。
【讨论】:
感谢您的回答。【参考方案3】:这是我的函数现在的样子:
void IntervalMapEstimator::extract_relevant_points_multithread(std::vector<Point3D>& relevant_points ,std::vector<Point3D>& pointcloud, doubleIE cell_min_angle_sensor_rot, doubleIE cell_max_angle_sensor_rot)
relevant_points.reserve (pointcloud.size ());
#pragma omp parallel for shared (relevant_points, pointcloud, cell_min_angle_sensor_rot, cell_max_angle_sensor_rot) num_threads(5)
for(int i = 0; i < pointcloud.size(); i++)
//int numThread = omp_get_thread_num();
//std::cout << "numThread = " << numThread << std::endl;
// Check whether the cell is between the 2nd and 3rd quadrant (--> e.g. -170 to 170°)
if ( cell_min_angle_sensor_rot < 0 && cell_max_angle_sensor_rot >= 0 && abs(cell_min_angle_sensor_rot) > M_PI/2 && abs(cell_max_angle_sensor_rot) > M_PI/2)
// Point must be smaller than the minimum angle and bigger than the max angle (e.g. min-angle: -1.5 max-angle: 1.5 point angle bigger than 1.5 or smaller than -1.5)
if ( pointcloud[i].pol_sensor_rot.phi <= cell_min_angle_sensor_rot || pointcloud[i].pol_sensor_rot.phi >= cell_max_angle_sensor_rot )
#pragma omp critical(push_in_relevant_points)
relevant_points.push_back(pointcloud[i]);
else
if (pointcloud[i].pol_sensor_rot.phi >= cell_min_angle_sensor_rot && pointcloud[i].pol_sensor_rot.phi <= cell_max_angle_sensor_rot )
#pragma omp critical(push_in_relevant_points)
relevant_points.push_back(pointcloud[i]);
我使用 mop critical 而不是 guard_lock 以获得更高的一致性,并且我在开始时做了一个保留。 因为在我的例子中,related_points 向量永远不会比点云大,而且大多数时候他只包含 50 到 100 个元素。而且我不再使用向量的副本,而是参考。
【讨论】:
以上是关于openMP 的多线程崩溃的主要内容,如果未能解决你的问题,请参考以下文章