es reindex使用
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了es reindex使用相关的知识,希望对你有一定的参考价值。
参考技术A 同一大版本升级(如6.1.x->6.8.x或7.1.x->7.8.x),索引读写兼容,不需要重建索引不同版本升级(如6.1.x->7.1.x),索引读写不兼容,需要重建索引
集群迁移,索引服务不停机,数据提前迁移
分片数量由少变多,由多变少
字段类型,字段属性变更、文档对象结构变更
索引频繁更新,产生很多内存碎片垃圾
reindex重建索引创建新索引,原索引保留,原有索引"_source"必须开启
url参数:
refresh:目标索引是否立即刷新
wait_for_active_shards:重建索引分片响应设置
scroll:快照查询时间
slices:重建并行任务切片(建议与分片数一致)
Max_docs:单次最大数据条数
requests_per_second:每秒数据量阈值控制,默认是-1(不限制),生产重建时建议控制在500-1000,控制重建的速度,防止集群io瞬间过大
req请求参数:
confilicts:重建索引冲突解决(覆盖、中断)
source:源索引配置信息
dest:新索引配置信息
script:处理脚本,处理原索引写入到新索引
routing:路由到指定分片
Multi index:多索引重建
Source field:限制重建索引的字段
field rename:索引字段重命名
remot:远程重建索引
elasticsearch部分常用操作
文章目录
集群搭建7.4.1版本,配置
3台机器组成一个集群,分别为:a,b,c
a:
编辑a的config/elasticsearch.yml配置文件,修改后如下
# ======================== Elasticsearch Configuration #=========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster #—————————————————
#
# Use a descriptive name for your cluster:
#集群名称
cluster.name: my-application
#
# ------------------------------------ Node ##
#
# Use a descriptive name for the node:
#确定master
node.master: true
#节点名称
node.name: node-1
#
#discovery.zen.minimum_master_nodes: 3
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths #
#
# Path to directory where to store the data (separate multiple locations by comma):
#es数据存放位置,需要手动创建目录和赋予权限
path.data: /opt/soft/data
#
# Path to log files:
#
#es日志存放位置,需要手动创建目录和赋予权限
path.logs: /opt/soft/log
#
# ----------------------------------- Memory #
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network #
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#允许自身各种ip访问
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#对外服务端口
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery #
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#指定集群里的所有节点,9300是集群间相互通信的端口
discovery.seed_hosts: ["10.209.5.87:9300","10.209.5.88:9300","10.209.5.89:9300"]
#discovery.zen.ping.unicast.hosts: ["10.209.5.79","10.209.5.80","10.209.5.78"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#集群启动指定的可选举的master节点
cluster.initial_master_nodes: ["node-1"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway #
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various #—————————————————
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
#这两行允许跨域
http.cors.enabled: true
http.cors.allow-origin: "*"
#reindex同步数据,数据迁移需要的其他机器的白名单,不然不能使用reindex,这表示当前节点可以#从以下白名单节点获取数据,通常是其他集群的节点
reindex.remote.whitelist: ["10.209.5.84:9200","10.209.5.78:9200","10.209.1.48:9200","10.209.1.35:5200","10.47.187.45:5200","10.47.195.38:5200"]
#指定冷归档数据的存放位置目录,冷归档的数据可以压缩文件夹后剪切移到其他机器,目录需要手#动创建并赋予权限
path.repo: ["/opt/soft/es_backups/backups", "/opt/soft/es_backups/longterm_backups"]
b:
机器的elasticsearch.yml
其他一样,修改
#注释
#node.master: true
#节点名称
node.name: node-2
c:
机器的elasticsearch.yml
其他一样,修改
#注释
#node.master: true
#节点名称
node.name: node-3
修改每一台机器的内存大小参数(64g为例)
修改config/jvm.options文件,最大不能超过31g,最好不超过整个机器的内存50%
-Xms30g
-Xmx30g
可安装ik分词器
需要指定版本
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.1/elasticsearch-analysis-ik-7.4.1.zip
linux优化
关闭交换分区,防止内存置换降低性能
swapoff -a
vim /etc/security/limits.conf
#文尾添加
* soft nofile 65535
* hard nofile 131072
* soft nproc 4096
* hard nproc 4096
vim /etc/sysctl.conf
vm.max_map_count=262145
#刷新配置
sysctl -p
es不允许root启动
#增加用户
useradd esuser
#切换用户
su esuser
启动命令:
一定要检查防火墙是否开放9200,9300端口
在解压目录执行命令
./bin/elasticsearch -d
索引映射创建,优化
创建索引es_persist_3
创建索引 es_persist_3
url
put http://ip:port/es_persist_3
json
"settings":
"number_of_shards": "12",
"number_of_replicas": "1",
"index.translog.durability": "async",
"index.translog.sync_interval": "60s",
"index.translog.flush_threshold_size": "1024mb"
创建映射mapping es_persist_3
创建mapping es_persist_3
url
post http://ip:port/es_persist_3/_mapping
json
"properties":
"servCode":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"httpMethod":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"type":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"servVersionProxyType":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"exceptionStack":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"exceptionTime":
"type": "date"
,
"@version":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"host":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"pAppName":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"id":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"receiveSize":
"type": "long"
,
"authType":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"externalTime":
"type": "long"
,
"cAppName":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"returnSize":
"type": "long"
,
"authCode":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"statusDesc":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"platformTime":
"type": "long"
,
"servName":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"componentPort":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"esbId":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"responseSize":
"type": "long"
,
"message":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"logTime":
"type": "date"
,
"tags":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"receiveTime":
"type": "long"
,
"@timestamp":
"type": "date"
,
"messageList":
"properties":
"sizeX":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"serialNumber":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"header":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"time":
"type": "long"
,
"body":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"type":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"url":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"componentHost":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"cAppCode":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"fromIp":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"complete":
"type": "boolean"
,
"requestSize":
"type": "long"
,
"logtime":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"callTime":
"type": "long"
,
"pAppCode":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"statusCode":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
创建索引 es_persist_4
url
put http://ip:port/es_persist_34
json
"settings":
"number_of_shards": "2",
"number_of_replicas": "1",
"index.translog.durability": "async",
"index.translog.sync_interval": "30s",
"index.translog.flush_threshold_size": "248mb"
创建mapping es_persist_4
url
post http://ip:port/es_persist_4/_mapping
json
"properties":
"servCode":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"componentHost":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"exceptionCount":
"type": "long"
,
"sumCallTime":
"type": "long"
,
"maxCallTime":
"type": "long"
,
"cAppCode":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
,
"minCallTime":
"type": "long"
,
"startTime":
"type": "long"
,
"endTime":
"type": "long"
,
"sumFlowSize":
"type": "long"
,
"totalCount":
"type": "long"
,
"servVersionProxyType":
"type": "text",
"fields":
"keyword":
"ignore_above": 256,
"type": "keyword"
es的常用指令
删除指定索引,从物理上整个索引的数据删除
url
delete http://ip:port/指定的索引名称
关闭索引,依然占着硬盘,关闭后不可进行io读写
url
post http://ip:port/指定的索引名称/_close
打开索引,占着硬盘,打开后可进行io读写,正常使用
url
post http://ip:port/指定的索引名称/_open
跨集群数据迁移
reindex迁移
b集群请求获取a集群的数据到b集群里,(b集群配置文件需要加上a集群的白名单,见集群安装配置文件)
query可以指定想要的数据,下面是获取指定月份时间段的数据,去掉则是全部数据
“version_type”: "internal"代表覆盖替换冲突的id相同的数据
size是批量条数,太大可能会报错,太小执行较慢
wait_for_completion=false后台异步操作
POST http://bip:bport/_reindex?wait_for_completion=false
"source":
"index": "a的索引",
"remote":
"host": "http://aip:aport"
,
"size": 1000,
"query":
"range":
"receiveTime":
"gte": 1635696000000,
"lt": 1638287999000
,
"dest":
"index": "b的索引",
"version_type": "internal"
reindex取消命令
reindex执行没结束不想再执行了,成功迁移复制过去的数据依然保留,后续未完成的不再继续
POST _tasks/node_id:task_id/_cancel
reindex查看进度(可以看到node_id:task_id,任务数等)
GET _tasks本地ES集群数据通过_reindex方式迁移到腾讯云服务器(亲测有效)