spark读写es

Posted 2021-04-27 振振有CI

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了spark读写es相关的知识，希望对你有一定的参考价值。

spark读写es

下载jar包：elasticsearch-spark-20_2.11-7.4.0.jar
启动pyspark

pyspark --jars /path/to/elasticsearch-spark-20_2.11-7.4.0.jar

读取es

df = spark.read \
    .format("org.elasticsearch.spark.sql") \
    .option("es.nodes", "192.168.200.25:9300,192.168.200.26:9300,192.168.200.27:9300") \
    .option("es.resource", "index_wcz") \
    .load()

（例子都是es.nodes全部写，实测只写一个也可。es7.*删除了type，index_wcz/student）

结果显示

age info name

21 tom tom

11 student abc
写es

age	info	name
21	tom	tom
11	student	abc

df.write.format("org.elasticsearch.spark.sql").option("es.nodes","192.168.200.25:9300").mode("overwrite").save("test_save")
# 如无，会自动创建test_save的index

验证数据

http://192.168.200.25:9300/test_save 
# 或curl
curl -XGET "192.168.200.25:9300/test_save"

vscode 的restapi插件
（比postman方便，另chrome有postman插件）

文件类型.http，如es.http

  POST http://192.168.200.26:9300/_xpack/sql?format=txt 

  User-Agent: rest-client

  Accept-Language: en-GB,en-US;q=0.8,en;q=0.6,zh-CN;q=0.4

  Content-Type: application/json

  

  {

    "query": "select a.name,sum(a.age) from index_wcz a where a.name='abc' group by a.name"

  }

实测，可以单表，但不能关联，报错：Queries with JOIN are not yet supported

以上是关于spark读写es的主要内容，如果未能解决你的问题，请参考以下文章

如何使用scala+spark读写hbase？

Spark main entry

在这个 spark 代码片段中 ordering.by 是啥意思？

ES7-Es8 js代码片段

python+spark程序代码片段

spark sql读写hive的过程