sparksql绯诲垪(涓€)鐜鎼缓
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了sparksql绯诲垪(涓€)鐜鎼缓相关的知识,希望对你有一定的参考价值。
鏍囩锛?a href='http://www.mamicode.com/so/1/uil' title='uil'>uil 鑷繁 %s ESS long lfw park 瑁呭寘
SCALA瀹夎鍖咃細https://pan.baidu.com/s/17f8AiS2n_g5kiQhxf7XIlA
hadoop瀹夎鍖咃細https://pan.baidu.com/s/1YNM2_eTV8Zf_2SxamfMrSQ
Spark瀹夎鍖咃細https://pan.baidu.com/s/17mf2_DMiNy7OdlFwygekhg
IDE瀹夎鍖咃細https://pan.baidu.com/s/1caaKufvSuHBX1xEFXvCwPw
涓€锛欽DK鐜鎼缓
鍏朵腑閲嶈鐨勫氨鏄袱涓畨瑁呰矾寰凧DK璺緞鍜孞RE璺緞锛屼笁涓幆澧冨彉閲忥細JAVA_HOME銆丳ATH銆丆LASSPATH
鎴戞湰鍦扮殑鏄? JDK璺緞(D:JAVAJDK) JRE璺緞(D:JAVAJRE)
JAVA_HOME(D:JAVAJDK)
PATH(%JAVA_HOME%in;%JAVA_HOME%jrein;)
CLASSPATH(.;%JAVA_HOME%libdt.jar;%JAVA_HOME%lib ools.jar;)
楠岃瘉鏂规锛歫ava -version
浜岋細SCALA鐜鎼缓
涓嬭浇鍚庯紝瑙e帇鍗冲彲銆傞厤缃竴涓幆澧冨彉閲忓嵆鍙€?/p>
閲嶈鐜鍙橀噺锛屽姞涓婂嵆鍙細SCALA_HOME(D:JAVAscala) PATH(%SCALA_HOME%in;%SCALA_HOME%jrein;)
楠岃瘉鏂规硶锛歴cala -version
涓夛細hadoop鐜鎼缓
銆€銆€銆€銆€涓嬭浇鍚庯紝瑙e帇鍗冲彲銆傞厤缃袱涓幆澧冨彉閲忓嵆鍙€?/p>
銆€銆€銆€銆€HADOOP_HOME(D:JAVAhadoop) PATH($HADOOP_HOME/bin;)
銆€銆€銆€銆€鍏朵腑闇€瑕佸皢bin涓嬮潰鐨勪笢瑗挎浛鎹㈡垚window鐗堟湰锛屽洜涓烘槸window鐗堟湰銆備笂闈㈤摼鎺ヤ腑鐨勫凡缁忔浛鎹㈣繃銆?nbsp;
鍥涳細杩愯鐨刯ar鍖?/h2>
銆€銆€銆€銆€spark 瑙e帇鐨勮矾寰勪腑jars鐩綍涓殑涓滆タ锛屾坊鍔犺繘IDE鐨刢lasspath灏辫浜?/p>
浜旓細IDE瑙e帇寮€灏辫
銆€銆€銆€銆€鎵撳紑鍚庢柊寤轰竴涓猻cala椤圭洰銆傚拰JAVA鐨処DE鎿嶄綔涓€妯′竴鏍凤紝鍦ㄦ涓嶈缁嗘弿杩般€?/p>
鍏細sparksql 浣犲ソ锛屼笘鐣?/h2>
銆€銆€銆€spark 瑙e帇鐨勮矾寰勪腑jars鐩綍涓殑涓滆タ锛屾坊鍔犺繘IDE鐨刢lasspath
import java.util.Arrays
import org.apache.spark.SparkConf
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.sql.{DataFrame, Row, SparkSession, functions}
import org.apache.spark.sql.functions.{col, countDistinct, desc, length, row_number, sum, trim, when,count}
import org.apache.spark.sql.types.{LongType, StringType, StructField, StructType}
import org.apache.spark.sql.expressions.Window
import org.apache.spark.storage.StorageLevel
import org.apache.spark.sql.SaveMode
object WordCount {
def main(args: Array[String]): Unit = {
val sparkSession= SparkSession.builder().master("local").appName("AppName").getOrCreate()
val javasc = new JavaSparkContext(sparkSession.sparkContext)
val nameRDD = javasc.parallelize(Arrays.asList("{鈥榥ame鈥?鈥榳angwu鈥?鈥榓ge鈥?鈥?8鈥?鈥榲ip鈥?鈥榯鈥榼",
"{鈥榥ame鈥?鈥榮unliu鈥?鈥榓ge鈥?鈥?9鈥?鈥榲ip鈥?鈥榯鈥榼","{鈥榥ame鈥?鈥榸hangsan鈥?鈥榓ge鈥?鈥?8鈥?鈥榲ip鈥?鈥榝鈥榼"));
val namedf = sparkSession.read.json(nameRDD)
namedf.select(col("name")).show(100)
}
}
hello锛寃orld璇﹁В
sparkSession鐨勪綔鐢ㄥ氨鐩稿綋浜巗parkcontext锛屾槸鎿嶄綔鏁版嵁鐨勫叧閿?/p>
銆€銆€SparkSession.builder() java宸ュ巶妯″紡鍝?/p>
銆€銆€master("local") 闄愬埗妯″紡鏄湰鍦版ā寮?/p>
銆€銆€appName("AppName") 濉啓鑷繁app鐨勫悕绉帮紝鍏嶅緱浠诲姟澶氱殑鏃跺€欐壘涓嶅埌鑷繁鐨勪换鍔?/p>
銆€銆€getOrCreate() 鍥哄畾鏂规硶锛屾垜涔熶笉鐭ラ亾涓轰粈涔堛€傚按灏紝灏村艾锛屽按灏?/p>
涓婄嚎sparksession鐨勫垵濮嬪寲
鎬庝箞锛焥parksession鐨勫垵濮嬪寲杩樺垎鏈湴妯″紡鍜岀嚎涓婃ā寮忓悧锛?/p>
銆€銆€ 涓嶏紝鍙槸涓婁竴娈典唬鐮佷腑鏈変竴涓猰aster鍙傛暟鏄痩ocal锛岃繖涓弬鏁扮殑鎰忔€濇槸鏈湴妯″紡銆傜湡姝d笂绾挎椂涓嶈兘鐢ㄨ繖涓紝闇€瑕佸幓鎺夈€?/p>
銆€銆€銆€val sparkSession= SparkSession.builder().appName("AppName").getOrCreate()
JavaSparkContext銆乶ameRDD銆乶amedf銆乻elect
銆€銆€javaSparkContext 鍏跺疄娌℃湁浠€涔堝嵉鐢ㄥ氨鏄皢json杞崲鎴恟dd
銆€銆€namedf rdd杞崲鎴愮殑Dataframe锛屽叾瀹炲拰spark-core涓殑rdd鏄竴涓蹇点€傚皢json涓茶浆鎹负涓€涓猻parksql琛ㄤ簡銆?/p>
銆€銆€select Dataframe鎵€鏈夌殑鎿嶄綔鍜孲QL鏄竴鏍风殑锛宻elect灏辨槸绠€鍗曠殑鏌ヨ
以上是关于sparksql绯诲垪(涓€)鐜鎼缓的主要内容,如果未能解决你的问题,请参考以下文章
锛堟暟鎹瀛﹀涔犳墜鏈?1锛塩onda+jupyter鐜╄浆鏁版嵁绉戝鐜鎼缓
杞绘澗鎼缓CAS 5.x绯诲垪-鐧诲綍鍚庢樉绀洪€氱煡淇℃伅