elasticsearch中从远程和本地群集重新索引

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了elasticsearch中从远程和本地群集重新索引相关的知识,希望对你有一定的参考价值。

我在一个看起来像这样的远程elasticsearch集群上有“ index_a”:


   _index: "index_a",
   _type: "_doc",
   _id: "1",
   _score: 1,
   _source: 
      customer_id: "1234",
      customer_name: "spider",
      message: "does what ever"
   
, 

   _index: "index_a",
   _type: "_doc",
   _id: "2",
   _score: 1,
   _source: 
      customer_id: "3333",
      customer_name: "pig",
      message: "spider-pid does"
   

而且我在我正在执行_reindex的当前Elasticsearch群集上也有“ index_a”(是的,名字相同!),看起来像这样:


   _index: "index_a",
   _type: "_doc",
   _id: "2",
   _score: 1,
   _source: 
      customer_id: "3333",
      customer_name: "pig",
      message: "spider-pid does"
   
,

   _index: "index_a",
   _type: "_doc",
   _id: "3",
   _score: 1,
   _source: 
      customer_id: "9876",
      customer_name: "coronavirus",
      message: "stay safe and at home"
   

您可以看到上面第一个“ index_a”中有重复文档,但是我想保留那里的新数据!

最终我想在当前的Elasticsearch集群中得到的就是这个index_b:


   _index: "index_b",
   _type: "_doc",
   _id: "1",
   _score: 1,
   _source: 
      customer_id: "1234",
      customer_name: "spider",
      message: "does what ever"
   
, 

   _index: "index_b",
   _type: "_doc",
   _id: "2",
   _score: 1,
   _source: 
      customer_id: "3333",
      customer_name: "pig",
      message: "spider-pid does"
   
,

   _index: "index_b",
   _type: "_doc",
   _id: "3",
   _score: 1,
   _source: 
      customer_id: "9876",
      customer_name: "coronavirus",
      message: "stay safe and at home"
   

所以基本上我知道事实我可以在two]_reindex请求,第一个_reindex将从远程集群index_a到当前的Elasticsearch集群index_b。第二个_reindex将从当前的弹性搜索簇index_a到当前的簇index_b。但就大数据而言,运行这两个_reindex请求是[[非常浪费,导致该请求所做的基本上是逐个运行在每个doc-id上,并写入/覆盖它。[尝试在单个_reindex请求上执行此操作时,我已经尝试过:

POST http://current_cluster/_reindex

"source": "remote": "host": "http://remote_cluster/" , "index": ["index_a-from-remote", "index_a-of-current"] //renamed them to be more understood for you , "dest": "index": "index_b"
并且该响应表明远程集群中没有“ index_a-of-current”,这是有道理的:之所以发生,是因为构建这种类型的_reindex请求仅是为了从远程Elasticsearch集群获取索引。 

所以我的问题是:

是否有一种方法可以执行单个_reindex请求,该请求既要从远程集群中获取“ index_a”,又要从当前集群中获取“ index_a”,并在当前集群中将它们都重新索引为“ index_b”?

[如果有人在此问题上提出任何建议,我会很高兴,因为我在请求中尝试了很多其他内容,并阅读了Reindex API文档,但尚未找到答案。tnx寻求帮助!

我在如下所示的远程Elasticsearch群集上具有“ index_a”:_index:“ index_a”,_type:“ _doc”,_id:“ 1”,_score:1,1,_source:customer_id:“ 1234”,。 ..

答案
[cross-cluster search,您也许可以做您想做的事。

以上是关于elasticsearch中从远程和本地群集重新索引的主要内容,如果未能解决你的问题,请参考以下文章

Elasticsearch + Kibana + X-Pack

Git中从远程的分支获取最新的版本到本地

「扫盲」elasticsearch(二)—集群安装篇

Elasticsearch 集群分配多少分片合理

AWS ElasticSearch 使用方案、主机和白名单从远程集群问题重新索引

ES 集群滚动升级