spark dataFrame withColumn
Posted abcdwxc
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark dataFrame withColumn相关的知识,希望对你有一定的参考价值。
说明:withColumn用于在原有DF新增一列
1. 初始化sqlContext
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
2.导入sqlContext隐式转换
import sqlContext.implicits._
3. 创建DataFrames
val df = sqlContext.read.json("file:///usr/local/spark-2.3.0/examples/src/main/resources/people.json")
4. 显示内容
df.show()
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
5. 为原有df新加一列
df.withColumn("id2", monotonically_increasing_id()+1)
6. 显示添加列后的内容
res6.show()
+----+-------+---+
| age| name|id2|
+----+-------+---+
|null|Michael| 1|
| 30| Andy| 2|
| 19| Justin| 3|
+----+-------+---+
完成的过程如下:
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.SQLContext = [email protected]
scala> import sqlContext.implicits._
import sqlContext.implicits._
scala> val df = sqlContext.read.json("file:///usr/local/spark-2.3.0/examples/src/main/resources/people.json")
2018-06-25 18:55:30 WARN ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
2018-06-25 18:55:30 WARN ObjectStore:568 - Failed to get database default, returning NoSuchObjectException
2018-06-25 18:55:32 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
scala> df.show()
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
scala> df.withColumn("id2", monotonically_increasing_id()+1)
res6: org.apache.spark.sql.DataFrame = [age: bigint, name: string ... 1 more field]
scala> res6.show()
+----+-------+---+
| age| name|id2|
+----+-------+---+
|null|Michael| 1|
| 30| Andy| 2|
| 19| Justin| 3|
+----+-------+---+
以上是关于spark dataFrame withColumn的主要内容,如果未能解决你的问题,请参考以下文章
[Spark][Python][DataFrame][SQL]Spark对DataFrame直接执行SQL处理的例子