将 RDD.cartesian 与 Spark Streaming 结合使用是不是存在错误？

Posted 2023-04-15

技术标签:

【中文标题】将 RDD.cartesian 与 Spark Streaming 结合使用是不是存在错误？【英文标题】：Is there a bug about using RDD.cartesian with Spark Streaming?将 RDD.cartesian 与 Spark Streaming 结合使用是否存在错误？ 【发布时间】：2016-12-13 08:39:00 【问题描述】：

我的代码：

ks1 = KafkaUtils.createStream(ssc, zkQuorum='localhost:2181', groupId='G1', topics='test': 2)
ks2 = KafkaUtils.createStream(ssc, zkQuorum='localhost:2181', groupId='G2', topics='test': 2)

d1 = ks1.map(lambda x: x[1]).flatMap(lambda x: list(x)).countByValue()
d2 = ks2.map(lambda x: x[1]).flatMap(lambda x: list(x)).countByValue()

d3 = d1.transformWith(lambda t, x, y: x.cartesian(y), d2)

然后我得到一些错误：

java.lang.ClassCastException: org.apache.spark.api.java.JavaPairRDD 无法转换为 org.apache.spark.api.java.JavaRDD

附言Python2.7.11 + Spark 2.0.2

谢谢

【问题讨论】：

【参考方案1】：

是的，有一个已知的错误。这是一个 JIRA：

https://issues.apache.org/jira/browse/SPARK-17756

【讨论】：

以上是关于将 RDD.cartesian 与 Spark Streaming 结合使用是不是存在错误？的主要内容，如果未能解决你的问题，请参考以下文章