ELK初级搭建

Posted 2020-10-27

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了ELK初级搭建相关的知识，希望对你有一定的参考价值。

架构

技术分享图片

上图为http://www.cnblogs.com/delgyd/p/elk.html#3656833 中的图

架构解读 : （整个架构从左到右，总共分为5层）（本文将第三层以下的进行了合并，无elasticsearch集群）

第一层、数据采集层

最左边的是业务服务器集群，上面安装了filebeat做日志采集，同时把采集的日志分别发送给两个logstash服务。

第二层、数据处理层，数据缓存层

logstash服务把接受到的日志经过格式处理，转存到本地的kafka broker+zookeeper 集群中。

第三层、数据转发层

这个单独的Logstash节点会实时去kafka broker集群拉数据，转发至ES DataNode。

第四层、数据持久化存储

ES DataNode 会把收到的数据，写磁盘，建索引库。

第五层、数据检索，数据展示

ES Master + Kibana 主要协调 ES集群，处理数据检索请求，数据展示。

Filebeat

版本

filebeat-5.5.2-1.x86_64

配置信息

#vim filebeat.yml

filebeat.modules:

filebeat.prospectors:

- input_type: log

paths: #定义读取log的路径，此处为每个项目一个路径，可以写多个或者用* 匹配

- /usr/local/nginx1.6/logs/sso.so.duia.com.log

include_lines: [ ]

multiline: #合并多行，下一行不是[ 开头，合并到上一行

pattern: '^\['

negate: true

match: after

document_type: sso-so #定义type，提供给logstash 引用，并最终定义elasticsearch 索引

tail_files: true

output.kafka: #输出到kafka中

enabled: true

hosts: ["172.16.101.76:9092"]

topic: nginx #定义消费队列，如果多个logstash消费，需要定义Parttion

compression: Snappy

max_message_bytes: 1000000

启动

nohup /usr/local/filebeat/filebeat -e -c /usr/local/filebeat/logs.yml -d "publish" &>> /data/logs/filebeat.log &

Zookeeper

版本

zookeeper-3.4.9.tar.gz

配置信息

#vim zoo.cfg

tickTime=2000

initLimit=10

syncLimit=5

dataDir=/data/zookeeper #定义数据存放位置

clientPort=2181

server.1=172.16.101.76:12888:13888

server.2=172.16.101.175:12888:13888

server.3=172.16.101.172:12888:13888

cat /data/zookeeper/myid

Zookeeper其他节点请参考此配置文件，只有myid不同。

启动

/usr/local/elk/zookeeper/bin/zkServer.sh start

Kafka

版本

kafka_2.12-0.10.2.0.tgz

配置信息

# vim server.properties

broker.id=1

port = 9092

host.name = 172.16.101.76 #监控地址

num.network.threads=3

num.io.threads=8

socket.send.buffer.bytes=102400

socket.receive.buffer.bytes=102400

socket.request.max.bytes=104857600

log.dirs=/data/logs/kafka #log文件存放位置

num.partitions=1

num.recovery.threads.per.data.dir=1

log.retention.hours=168

log.segment.bytes=1073741824

log.retention.check.interval.ms=300000

zookeeper.connect=127.0.0.1:2181 #连接zookeeper地址

zookeeper.connection.timeout.ms=6000

启动

bin/kafka-server-start.sh config/server.properties &

Logstash

版本

logstash-5.2.2.tar.gz

配置信息

input {

kafka {

bootstrap_servers => "172.16.101.76:9092"

topics => ["nginx"]

codec => "json"

decorate_events => true

}

input {

kafka {

bootstrap_servers => "172.16.101.76:9092"

topics => ["tomcat"]

codec => "json"

decorate_events => true

}

filter {

#nginx

if [type] == "nginx-access.log” {

grok {

match => {

"message" => "\[%{HTTPDATE:timestamp}\] %{IPV4:client_ip} \"%{USER:forward}\" %{USER:user} %{IPORHOST:host} \"%{WORD:method} %{URIPATHPARAM:valume} %{URIPROTO:http}/%{NUMBER:http_version}\" %{QS:request_body} %{NUMBER:status:int} \"(?:%{IPORHOST:urlname} %{POSINT:urlport})\" %{NUMBER:request_time} %{IPV4:upstream_host}:%{NUMBER:upstream_port} %{NUMBER:reponse_time} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}"

}

remove_field => ["message"]

}

geoip {

source => "client_ip"

target => "geoip"

database => "/data/GeoIP/GeoLite2-City.mmdb"

add_field => ["location" , "%{[geoip][latitude]}, %{[geoip][longitude]}"]

}

date {

match => ["timestamp" , "dd/MMM/YYYY:HH:mm:ss Z"]

target => "@timestamp"

remove_field => ["timestamp"]

}

if [type] == "catalina.out" {

grok {

match => {

"message" => "%{COMMONAPACHELOG}"

}

remove_field => ["message"]

}

output {

if "_grokparsefilure" in [tags] {

file {

path => "data/logs/grokparsefailure-%{[type]}-%{+YYYY.MM}.log"

}

elasticsearch {

hosts => ["172.16.101.76:9200"]

index => "%{type}-%{+YYYY.MM.dd}"

template_overwrite => true

}

启动

/usr/local/elk/logstash/bin/logstash -f /usr/local/elk/logstash/config/logs.yml &

Elasticsearch

版本

elasticsearch-5.2.2.tar.gz

配置信息

[[email protected] config]# grep -vE "^$|^#" elasticsearch.yml

cluster.name: Mo

node.name: node01

node.attr.rack: r1

path.data: /data/elasticsearch

path.logs: /data/logs/elasticsearch

bootstrap.memory_lock: false

network.host: 172.16.101.76

http.port: 9200

discovery.zen.ping.unicast.hosts: ["172.16.101.76","172.16.101.172"]

discovery.zen.minimum_master_nodes: 1

gateway.recover_after_nodes: 1

action.destructive_requires_name: true

bootstrap.system_call_filter: false

thread_pool.index.queue_size: 500

thread_pool.bulk.queue_size: 1000

indices.recovery.max_bytes_per_sec: 100mb

http.cors.enabled: true

http.cors.allow-origin: "*"

[[email protected] config]# grep -vE "^$|^#" jvm.options

-Xms6g

-Xmx6g

-XX:+UseConcMarkSweepGC

-XX:CMSInitiatingOccupancyFraction=75

-XX:+UseCMSInitiatingOccupancyOnly

-XX:+DisableExplicitGC

-XX:+AlwaysPreTouch

-server

-Xss1m

-Djava.awt.headless=true

-Dfile.encoding=UTF-8

-Djna.nosys=true

-Djdk.io.permissionsUseCanonicalPath=true

-Dio.netty.noUnsafe=true

-Dio.netty.noKeySetOptimization=true

-Dio.netty.recycler.maxCapacityPerThread=0

-Dlog4j.shutdownHookEnabled=false

-Dlog4j2.disable.jmx=true

-Dlog4j.skipJansi=true

-XX:+HeapDumpOnOutOfMemoryError

配置文件修改参考官方文档

https://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/dont-touch-these-settings.html

启动

bin/elasticsearch -d

Kibana

版本

kibana-5.2.2-linux-x86_64.tar.gz

配置信息

[[email protected] config]# grep -vE "^$|^#" kibana.yml

server.port: 5601

server.host: "172.16.101.76"

elasticsearch.url: "http://172.16.101.76:9200"

elasticsearch.pingTimeout: 1500

elasticsearch.requestTimeout: 30000

elasticsearch.requestHeadersWhitelist: [ authorization ]

pid.file: /usr/local/kibana/kibana.pid

logging.dest: /data/logs/kibana/kibana.log

启动

bin/kibana &

Nginx

配置信息

upstream kibana {

server 172.16.101.76:5601 max_fails=3 fail_timeout=30s;

}

server {

listen 8080;

server_name localhost;

location / {

proxy_pass http://kibana/;

index index.html index.htm;

#auth

#auth_basic "kibana Private";

#auth_basic_user_file /etc/nginx/.htpasswd;

}

注释

Logstash为读取kafka中的数据，并且将数据通过grok中的正则进行格式化，输出到elasticsearch中。

遇到的问题：

1、grok未生效，logstash调试

output {

stdout {

codec => rubydebug

}

使用debug模式，输出的内容不是grok格式好的json串，经过摸索，input 中定义

Codec => “json” 输出后为grok格式化的json格式。

2、同一种log_format定义的log信息不匹配

Grok 不需要特别的匹配到字符串格式，同时，不需要的信息，可以不进行匹配。

3、grok调试

http://grokdebug.herokuapp.com/?#

此网站在线调试，开始时需使用代理

4、logstash中output 可以动态定义索引，同时也可以指定固定索引

5、logstash 可以使用if 判断定义input 中的数据源和 output中的输出及index

Elasticsearch

查看节点

curl '172.16.101.76:9200/_cat/nodes?v'

查看健康状况

curl '172.16.101.76:9200/_cat/health?v'

清理缓存

curl http://127.0.0.1:9200/logstash-*/_cache/clear

查看索引

curl -s 'http://172.16.101.76:9200/_cat/indices?v'

查看elasticsearch线程情况

curl -XGET http://xxxx:9200/_nodes/stats/thread_pool?pretty

清理索引

curl -XDELETE 'http://172.16.101.76:9200/*'

批量清理指定日期的索引

#curl -s 'http://172.16.101.76:9200/_cat/indices?v' | sort | awk '{print $3}' > del_index.txt

#for i in `grep 2017.12.22 del_index.txt` ;do curl -XDELETE "http://172.16.101.76:9200/${i}" && sleep 10 ;done

查看elasticsearch 所有模版

curl -XGET localhost:9200/_template | python -m json.tool

查看索引的mapping

curl -XGET http://127.0.0.1:9200/*/_mapping/

删除elasticsearch 索引模版

curl -XDELETE localhost:9200/_template/*

添加自定义模版

curl -XPUT localhost:9200/_template/nginx -d@template.json

Template.json

{

"aliases": {},

"mappings": {

"_default_": {

"_all": {

"enabled": true,

"norms": false

"dynamic_templates": [

{

"message_field": {

"mapping": {

"norms": false,

"type": "text"

"match_mapping_type": "string",

"path_match": "message"

}

{

"string_fields": {

"mapping": {

"fields": {

"keyword": {

"type": "keyword"

}

"norms": false,

"type": "text"

"match": "*",

"match_mapping_type": "string"

}

"properties": {

"@timestamp": {

"include_in_all": false,

"type": "date"

"@version": {

"include_in_all": false,

"type": "keyword"

"geoip": {

"dynamic": true,

"properties": {

"ip": {

"type": "ip"

"latitude": {

"type": "half_float"

"location": {

"type": "geo_point"

"longitude": {

"type": "half_float"

}

"request_body": {

"ignore_above": 32766,

"index": "no",

"type": "keyword"

}

"order": 0,

"settings": {

"index": {

"refresh_interval": "5s"

}

"template": "nginx-*",

"version": 50001

}

#因为抽取了nginx 日志，在地图上不显示地区，发现elasticsearch中的模版， geoip总location字段类型不为 geo_point, 没有使用默认模版，通过修改默认模版名称，符合nginx的索引规则，然后清理了索引及历史记录，同时重建索引。地图上成功显示成功。

参考链接：

http://www.cnblogs.com/delgyd/p/elk.html#3656833

https://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/dont-touch-these-settings.html

https://www.iyunv.com/forum.php?mod=viewthread&tid=404905&extra=page%3D1%26filter%3Dtypeid%26typeid%3D677

http://blog.csdn.net/zhaoyangjian724/article/details/52337402

以上是关于ELK初级搭建的主要内容，如果未能解决你的问题，请参考以下文章

ELK 集群搭建总结

Springboot--ELK快速搭建

ELK搭建实时日志分析平台(elk+kafka+metricbeat)-KAFKA搭建

ELK搭建实时日志分析平台(elk+kafka+metricbeat)-metricbeat客户端搭建搭建

ELK搭建实时日志分析平台(elk+kafka+metricbeat)-搭建说明

搭建ELK日志分析平台（下）—— 搭建kibana和logstash服务器