Apache Spark:广播挂起
Posted
技术标签:
【中文标题】Apache Spark:广播挂起【英文标题】:Apache Spark: Hangs on Broadcast 【发布时间】:2017-04-01 03:16:56 【问题描述】:我很难在 Yarn 上调试我的 Spark 1.6.2 应用程序。它以客户端模式运行。本质上它是在锁定而不会崩溃,并且当它锁定时控制台中的日志如下所示。
17/03/31 20:12:02 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh007.prod.phx3.gdg:47579 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:03 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on p3plcdsh011.prod.phx3.gdg:63228 (size: 5.4 KB, free: 511.1 MB)
17/03/31 20:12:03 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on p3plcdsh015.prod.phx3.gdg:9377 (size: 5.4 KB, free: 511.1 MB)
17/03/31 20:12:03 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on p3plcdsh015.prod.phx3.gdg:61897 (size: 5.4 KB, free: 511.1 MB)
17/03/31 20:12:03 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh002.prod.phx3.gdg:23170 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:03 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on p3plcdsh016.prod.phx3.gdg:16649 (size: 5.4 KB, free: 511.1 MB)
17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh003.prod.phx3.gdg:55147 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on p3plcdsh008.prod.phx3.gdg:7619 (size: 5.4 KB, free: 511.1 MB)
17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh003.prod.phx3.gdg:40830 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh011.prod.phx3.gdg:20056 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh008.prod.phx3.gdg:47385 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh003.prod.phx3.gdg:2063 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh011.prod.phx3.gdg:63228 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh008.prod.phx3.gdg:64036 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh016.prod.phx3.gdg:16649 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh013.prod.phx3.gdg:31979 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh013.prod.phx3.gdg:18407 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh004.prod.phx3.gdg:45536 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh008.prod.phx3.gdg:50826 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:06 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh015.prod.phx3.gdg:36247 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:06 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh015.prod.phx3.gdg:22848 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:06 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh015.prod.phx3.gdg:9377 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:06 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh015.prod.phx3.gdg:61897 (size: 26.7 KB, free: 511.1 MB)
17/03/31 20:12:07 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh008.prod.phx3.gdg:7619 (size: 26.7 KB, free: 511.1 MB)
在 Spark UI 中,锁定发生在地图或过滤器函数处。
以前有没有人看到过这种情况或知道如何调试这种情况?
看起来可能是由于内存问题或空间问题,但没有明确的迹象。我可以尝试增加内存,看看是否有帮助,但有人有调试技巧吗?
谢谢
【问题讨论】:
你在播什么? 调试问题它看起来像一个相当大的 Java 对象(由 300mb 未压缩文件支持的东西)......但它会序列化,否则我会看到关于序列化 @Vidya 的崩溃问题。可以序列化的对象的大小是否有限制或增加对象的最大大小的方法? 看到同样的问题 .. 广播对象对我来说很小。 【参考方案1】:仅仅可序列化是不够的。问题可能有很多:您的序列化机制(Java 序列化很糟糕;Kryo 更好;等等),您的机器内存,确保您使用广播值而不是包装值等。
还有Spark配置spark.sql.autoBroadcastJoinThreshold
:
"配置表的最大大小(以字节为单位),该表将在执行连接时广播到所有工作节点。通过将此值设置为 -1,可以禁用广播。请注意,当前仅 Hive 支持统计信息运行了 ANALYZE TABLE COMPUTE STATISTICS noscan 命令的 Metastore 表。"
默认为 10MB 序列化。
最后,如果你去掉这个默认限制并且你有足够的内存,你仍然希望大小小于你最大的 RDD/DataFrame,你可以通过SizeEstimator
来检查:
import org.apache.spark.util.SizeEstimator._
logInfo(estimate(rdd))
最后,如果情况变得更糟,我会考虑在您的转换中从闪电般快速的缓存数据存储中进行查找,而不是广播此文件。
【讨论】:
以上是关于Apache Spark:广播挂起的主要内容,如果未能解决你的问题,请参考以下文章