[Spark][Python]Spark Join 小例子

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[Spark][Python]Spark Join 小例子相关的知识,希望对你有一定的参考价值。

[[email protected] ~]$ hdfs dfs -cat people.json

{"name":"Alice","pcode":"94304"}
{"name":"Brayden","age":30,"pcode":"94304"}
{"name":"Carla","age":19,"pcoe":"10036"}
{"name":"Diana","age":46}
{"name":"Etienne","pcode":"94104"}
[[email protected] ~]$

hdfs dfs -cat pcodes.json

{"pcode":"10036","city":"New York","state":"NY"}
{"pcode:"87501","city":"Santa Fe","state":"NM"}
{"pcode":"94304","city":"Palo Alto","state":"CA"}
{"pcode":"94104","city":"San Francisco","state":"CA"}

sqlContext = HiveContext(sc)
peopleDF = sqlContext.read.json("people.json")

sqlContext = HiveContext(sc)
pcodesDF = sqlContext.read.json("pcodes.json")

mydf001=peopleDF.join(pcodesDF,"pcode")

mydf001.limit(5).show()

+-----+----+-------+----+---------------+-------------+-----+
|pcode| age| name|pcoe|_corrupt_record| city|state|
+-----+----+-------+----+---------------+-------------+-----+
|94304|null| Alice|null| null| Palo Alto| CA|
|94304| 30|Brayden|null| null| Palo Alto| CA|
|94104|null|Etienne|null| null|San Francisco| CA|
+-----+----+-------+----+---------------+-------------+-----+

 

以上是关于[Spark][Python]Spark Join 小例子的主要内容,如果未能解决你的问题,请参考以下文章

大数据开发-Spark Join原理详解

spark join

Spark中的join策略

Spark原理 | Apache Spark 中支持的七种 Join 类型简介

4,Spark中 join的原理

6000字总结Spark的5种join策略(建议收藏)