Spark2 DataFrame数据框常用操作之分析函数--排名函数row_number,rank,dense_rank,percent_rank
Posted 智能先行者
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark2 DataFrame数据框常用操作之分析函数--排名函数row_number,rank,dense_rank,percent_rank相关的知识,希望对你有一定的参考价值。
select gender,
age,
row_number() over(partition by gender order by age) as rowNumber,
rank() over(partition by gender order by age) as ranks,
dense_rank() over(partition by gender order by age) as denseRank,
percent_rank() over(partition by gender order by age) as percentRank
from Affairs
val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.some.config.option", "some-value").getOrCreate() // For implicit conversions like converting RDDs to DataFrames import spark.implicits._ val dataList: List[(Double, String, Double, Double, String, Double, Double, Double, Double)] = List( (0, "male", 37, 10, "no", 3, 18, 7, 4), (0, "female", 27, 4, "no", 4, 14, 6, 4), (0, "female", 32, 15, "yes", 1, 12, 1, 4), (0, "male", 57, 15, "yes", 5, 18, 6, 5), (0, "male", 22, 0.75, "no", 2, 17, 6, 3), (0, "female", 32, 1.5, "no", 2, 17, 5, 5), (0, "female", 22, 0.75, "no", 2, 12, 1, 3), (0, "male", 57, 15, "yes", 2, 14, 4, 4), (0, "female", 32, 15, "yes", 4, 16, 1, 2), (0, "male", 22, 1.5, "no", 4, 14, 4, 5), (0, "male", 37, 15, "yes", 2, 20, 7, 2), (0, "male", 27, 4, "yes", 4, 18, 6, 4), (0, "male", 47, 15, "yes", 5, 17, 6, 4), (0, "female", 22, 1.5, "no", 2, 17, 5, 4), (0, "female", 27, 4, "no", 4, 14, 5, 4), (0, "female", 37, 15, "yes", 1, 17, 5, 5), (0, "female", 37, 15, "yes", 2, 18, 4, 3), (0, "female", 22, 0.75, "no", 3, 16, 5, 4), (0, "female", 22, 1.5, "no", 2, 16, 5, 5), (0, "female", 27, 10, "yes", 2, 14, 1, 5), (0, "female", 22, 1.5, "no", 2, 16, 5, 5), (0, "female", 22, 1.5, "no", 2, 16, 5, 5), (0, "female", 27, 10, "yes", 4, 16, 5, 4), (0, "female", 32, 10, "yes", 3, 14, 1, 5), (0, "male", 37, 4, "yes", 2, 20, 6, 4)) val data = dataList.toDF("affairs", "gender", "age", "yearsmarried", "children", "religiousness", "education", "occupation", "rating") data.printSchema() // 创建视图 data.createOrReplaceTempView("Affairs") val s1="row_number() over(partition by gender order by age) as rowNumber," val s2="rank() over(partition by gender order by age) as ranks," val s3="dense_rank() over(partition by gender order by age) as denseRank," val s4="percent_rank() over(partition by gender order by age) as percentRank" val df8=spark.sql("select gender,age,"+s1+s2+s3+s4+" from Affairs") df8.show(50) +------+----+---------+-----+---------+------------------+ |gender| age|rowNumber|ranks|denseRank| percentRank| +------+----+---------+-----+---------+------------------+ |female|22.0| 1| 1| 1| 0.0| |female|22.0| 2| 1| 1| 0.0| |female|22.0| 3| 1| 1| 0.0| |female|22.0| 4| 1| 1| 0.0| |female|22.0| 5| 1| 1| 0.0| |female|22.0| 6| 1| 1| 0.0| |female|27.0| 7| 7| 2| 0.4| |female|27.0| 8| 7| 2| 0.4| |female|27.0| 9| 7| 2| 0.4| |female|27.0| 10| 7| 2| 0.4| |female|32.0| 11| 11| 3|0.6666666666666666| |female|32.0| 12| 11| 3|0.6666666666666666| |female|32.0| 13| 11| 3|0.6666666666666666| |female|32.0| 14| 11| 3|0.6666666666666666| |female|37.0| 15| 15| 4|0.9333333333333333| |female|37.0| 16| 15| 4|0.9333333333333333| | male|22.0| 1| 1| 1| 0.0| | male|22.0| 2| 1| 1| 0.0| | male|27.0| 3| 3| 2| 0.25| | male|37.0| 4| 4| 3| 0.375| | male|37.0| 5| 4| 3| 0.375| | male|37.0| 6| 4| 3| 0.375| | male|47.0| 7| 7| 4| 0.75| | male|57.0| 8| 8| 5| 0.875| | male|57.0| 9| 8| 5| 0.875| +------+----+---------+-----+---------+------------------+
以上是关于Spark2 DataFrame数据框常用操作之分析函数--排名函数row_number,rank,dense_rank,percent_rank的主要内容,如果未能解决你的问题,请参考以下文章
Spark2 DataFrame数据框常用操作之分析函数--排名函数row_number,rank,dense_rank,percent_rank
Spark2 DataFrame数据框常用操作之统计指标:mean均值,variance方差,stddev标准差,corr(Pearson相关系数),skewness偏度,kurtosis峰度((代码
Spark2 加载保存文件,数据文件转换成数据框dataframe