spark常用转换操作:keys values和mapValues
Posted zzhangyuhang
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark常用转换操作:keys values和mapValues相关的知识,希望对你有一定的参考价值。
1.keys
功能:
返回所有键值对的key
示例
val list = List("hadoop","spark","hive","spark") val rdd = sc.parallelize(list) val pairRdd = rdd.map(x => (x,1)) pairRdd.keys.collect.foreach(println)
结果
hadoop spark hive spark list: List[String] = List(hadoop, spark, hive, spark) rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[142] at parallelize at command-3434610298353610:2 pairRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[143] at map at command-3434610298353610:3
2.values
功能:
返回所有键值对的value
示例
val list = List("hadoop","spark","hive","spark") val rdd = sc.parallelize(list) val pairRdd = rdd.map(x => (x,1)) pairRdd.values.collect.foreach(println)
结果
1 1 1 1 list: List[String] = List(hadoop, spark, hive, spark) rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[145] at parallelize at command-3434610298353610:2 pairRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[146] at map at command-3434610298353610:3
3.mapValues(func)
功能:
对键值对每个value都应用一个函数,但是,key不会发生变化。
示例
val list = List("hadoop","spark","hive","spark") val rdd = sc.parallelize(list) val pairRdd = rdd.map(x => (x,1)) pairRdd.mapValues(_+1).collect.foreach(println)//对每个value进行+1
结果
(hadoop,2) (spark,2) (hive,2) (spark,2)
以上是关于spark常用转换操作:keys values和mapValues的主要内容,如果未能解决你的问题,请参考以下文章
python字典(dict)+常用方法操作+列表元组集合字典的互相转换