scala.collection.immutable.Iterable[org.apache.spark.sql.Row] 到 DataFrame ?错误:使用替代方法重载了方法值 createDat
Posted
技术标签:
【中文标题】scala.collection.immutable.Iterable[org.apache.spark.sql.Row] 到 DataFrame ?错误:使用替代方法重载了方法值 createDataFrame【英文标题】:scala.collection.immutable.Iterable[org.apache.spark.sql.Row] to DataFrame ? error: overloaded method value createDataFrame with alternatives 【发布时间】:2017-10-12 10:23:46 【问题描述】:我有一些 sql.Row
对象希望在 Spark 1.6.x 中转换为 DataFrame
我的行看起来像:
events: scala.collection.immutable.Iterable[org.apache.spark.sql.Row] = List([14183197,Browse,80161702,8702170626376335,59,527780275219,List(NavigationLevel, Session)], [14183197,Browse,80161356,8702171157207449,72,527780278061,List(StartPlay, Action, Session)])
打印出来:
events.foreach(println)
[14183197,Browse,80161702,8702170626376335,59,527780275219,List(NavigationLevel, Session)]
[14183197,Browse,80161356,8702171157207449,72,527780278061,List(StartPlay, Action, Session)]
所以我为数据创建了一个模式;
val schema = StructType(Array(
StructField("trackId", IntegerType, true),
StructField("location", StringType, true),
StructField("videoId", IntegerType, true),
StructField("id", StringType, true),
StructField("sequence", IntegerType, true),
StructField("time", StringType, true),
StructField("type", ArrayType(StringType), true)
))
然后我尝试通过以下方式创建DataFrame
:
val df = sqlContext.createDataFrame(events, schema)
但我收到以下错误;
error: overloaded method value createDataFrame with alternatives:
(data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rdd: org.apache.spark.rdd.RDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rows: java.util.List[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
(rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
(rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
cannot be applied to (scala.collection.immutable.Iterable[org.apache.spark.sql.Row], org.apache.spark.sql.types.StructType)
我不知道为什么会这样,是因为Row
中的基础数据没有类型信息吗?
非常感谢任何帮助
【问题讨论】:
【参考方案1】:你必须parallelize
:
val sc: SparkContext = ???
val df = sqlContext.createDataFrame(sc.parallelize(events), schema)
【讨论】:
谢谢,但是我得到了一些转换错误java.lang.ClassCastException: scala.math.BigInt cannot be cast to java.lang.Integer
,不知道为什么,因为我没有在任何地方声明类型以上是关于scala.collection.immutable.Iterable[org.apache.spark.sql.Row] 到 DataFrame ?错误:使用替代方法重载了方法值 createDat的主要内容,如果未能解决你的问题,请参考以下文章