sparkSQL???cache???????????????
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了sparkSQL???cache???????????????相关的知识,希望对你有一定的参考价值。
?????????src efault ?????? ?????? ?????? app ext ?????? ??????
??????sparkSQL?????????cache????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
??????
????????????
???????????????spark????????????????????????????????????????????????????????????????????????????????????????????????action,????????????action??????????????????????????????????????????????????????????????????????????????????????????action?????????????????????action????????????job????????????action?????????????????????????????????????????????????????????action??????????????????????????????????????????action????????????????????????????????????
????????????
test1???????????????
1???2018-07-01 10???10???03
2???2018-07-01 11???12???04
??????????????????
val odsData = spark.sql("""
select
from default.test1
where time < "2018-07-02"
""")
val targetData = odsData.map(fun _)
val targetData.createOrReplaceTempView("data1")
//?????????Action??????
val spark.sql("""
insert overwrite table default.test2
*
from data1
""")
val targetData1 = odsData.map(fun2 _) //????????????????????????
targetData1.createOrReplaceTempView("data2")
//?????????action??????
val spark.sql("""
insert table default.test2
*
from data2
""")
????????????????????????Action????????????test1??????????????????????????????3,2018-07-01 13???12???04
??????????????????Action?????????????????????1???2???????????????????????????Action????????????????????????Action?????????
??????????????????????????????3,2018-07-01 13???12???04
?????????test2????????????????????????????????????
?????????????????????????????????action???insert?????????insert overwrite???
1???2018-07-01 10???10???03
2???2018-07-01 11???12???04
1???2018-07-01 10???10???03
2???2018-07-01 11???12???04
???????????????
1???2018-07-01 10???10???03
2???2018-07-01 11???12???04
1???2018-07-01 10???10???03
2???2018-07-01 11???12???04
3,2018-07-01 13???12???04
????????????
???????????????????????????????????????????????????????????????spark??????????????????????????????????????????spark???lazy????????????????????????action????????????????????????????????????????????????application???????????????action???????????????aciton??????????????????????????????rdd?????????????????????odsData?????????????????????action???????????????????????????????????????rdd??????????????????????????????odsData?????????????????????aciton??????????????????????????????????????????????????????????????????????????????odsData???????????????????????????????????????????????????????????????action????????????????????????????????????odsData?????????RDD???????????????????????????????????????????????????spark???????????????????????????????????????????????????job???stage???rdd???task??????????????????????????????job???????????????????????????????????????job???stage?????????????????????????????????????????????????????????????????????
cache?????????
??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????cache????????????????????????Action???????????????odsData?????????????????????????????????????????????????????????????????????????????????action?????????????????????????????????????????????????????????too young to sample??????????????????????????????????????????????????????????????????????????????
test??????????????????
1 2017-01-01 01:00:00 2016-05-04 9999-12-31
2 2017-01-01 02:00:00 2016-01-01 9999-12-31
?????????
val curentData = spark.sql(
"""
|select
|*
|from default.test
""".stripMargin)
curentData.cache() //?????????????????????
curentData.createOrReplaceTempView("dwData")
//?????????Action
spark.sql(
"""
|INSERT OVERWRITE TABLE default.test1
|SELECT
|
|FROM dwData
""".stripMargin)
//??????????????????test??????????????????????????????Action
spark.sql(
"""
|INSERT OVERWRITE TABLE default.test
|SELECT
| 1,
| "2017",
| "2018",
| "2018"
|FROM default.test
""".stripMargin)
//?????????Action????????????Action????????????,??????cache???????????????????????????
spark.sql(
"""
|INSERT OVERWRITE TABLE default.test1
|SELECT
|
|FROM dwData
""".stripMargin)
??????test1???????????????
??????????????????
1 2017-01-01 01:00:00 2016-05-04 9999-12-31
2 2017-01-01 02:00:00 2016-01-01 9999-12-31
???????????????
1 2017 2018 2018
1 2017 2018 2018
????????????
?????????????????????????????????????????????cache????????????????????????????????????????????????Action???????????????????????????cache???????????????????????????????????????????????????
?????????Action??????????????????
?????????Action????????????
?????????????????????????????????????????????cache?????????????????????job??????????????????job??????????????????????????????????????????????????????
??????????????????????????????cache??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????job??????rdd??????job???cache??????????????????
????????????
??????????????????????????????????????????action?????????????????????????????????rdd????????????????????????????????????????????????
?????????
????????????Action??????????????????????????????????????????????????????
??????????????????Action????????????????????????????????????????????????????????????odsData
?????????????????????????????????????????????????????????????????????
以上是关于sparkSQL???cache???????????????的主要内容,如果未能解决你的问题,请参考以下文章
第九篇:Spark SQL 源码分析之 In-Memory Columnar Storage源码分析之 cache table