Spark 2.0：如何将元组的 RDD 转换为 DF [重复]

Posted 2023-04-15

技术标签:

【中文标题】Spark 2.0：如何将元组的 RDD 转换为 DF [重复]【英文标题】：Spark 2.0: how to convert a RDD of Tuples to DF [duplicate] 【发布时间】：2017-06-01 03:12:16 【问题描述】：

我正在将我的一个项目从 Spark 1.6 升级到 Spark 2.0.1。以下代码适用于 Spark 1.6，但不适用于 2.0.1：

   def count(df: DataFrame): DataFrame = 
    val sqlContext = df.sqlContext
    import sqlContext.implicits._

    df.map  case Row(userId: String, itemId: String, count: Double) =>
      (userId, itemId, count)
    .toDF("userId", "itemId", "count")

这是错误信息：

Error:(53, 12) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
    df.map  case Row(userId: String, itemId: String, count: Double) =>
           ^
Error:(53, 12) not enough arguments for method map: (implicit evidence$7: org.apache.spark.sql.Encoder[(String, String, Double)])org.apache.spark.sql.Dataset[(String, String, Double)].
Unspecified value parameter evidence$7.
    df.map  case Row(userId: String, itemId: String, count: Double) =>
       ^

我尝试使用df.rdd.map而不是df.map，然后得到以下错误：

Error:(55, 7) value toDF is not a member of org.apache.spark.rdd.RDD[(String, String, Double)]
possible cause: maybe a semicolon is missing before `value toDF'?
    .toDF("userId", "itemId", "count")
      ^

如何在 Spark 2.0 中将元组的 RDD 转换为数据帧？

【问题讨论】：

您是否尝试导入importing spark.implicits._ ？ @rogue-one 是的，尝试将val sqlContext = df.sqlContext import sqlContext.implicits._ 更改为val spark = df.sparkSession import spark.implicits._，但得到了同样的错误。 【参考方案1】：

您的代码中的其他地方很可能存在语法错误，因为您的 map 函数似乎在您获取时编写正确

错误:(53, 12) 方法映射没有足够的参数: (implicit evidence$7: org.apache.spark.sql.Encoder[(String, String, Double)])org.apache.spark.sql.Dataset [（字符串，字符串，双精度）]。未指定值参数证据$7

您的代码在我的 Spark shell 中可以正常工作，我已经对其进行了测试。

【讨论】：

以上是关于Spark 2.0：如何将元组的 RDD 转换为 DF [重复]的主要内容，如果未能解决你的问题，请参考以下文章