Loki日志收集单进程模式部署
Posted 罗显明-技术个人博客
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Loki日志收集单进程模式部署相关的知识,希望对你有一定的参考价值。
一、Loki特点
二、Loki组件
三、Loki架构图
四、Loki部署
五、与EFK比较
一、Loki特点
1.1 围绕日志标签构建索引,而不是像es一样进行全文索引
1.2 多租户
通过tenant ID实现多租户,如果关闭多租户, 则默认唯一租户为fake
1.3 部署模式
1.1 单进程模式
所有的组件运行在一个进程中,适用于测试环境或者较小的生产环境
1.2 微服务可扩展模式
各组件单独运行,可水平伸缩扩展
二、Loki组件
1. Distributor
负责处理客户端的日志写入,负责接收日志数据,然后将其拆分成多个块,并行的发送给ingester
Distributor通过GRPC协议与Ingester进行通信
2. Hashing
Distributor通过一致性哈希和可配置因子来确定哪些Ingester服务的实例应该接收日志数据
hash基于日志标签和tenant ID
console中的hash ring用于实现一致性hash,所有Ingester都使用自己拥有的一组令牌注册到console中,Distributor通过找到日志hash值最
匹配的令牌并将日志数据发送给该令牌的所有者
3. Ingester
负责将日志数据写入持久化后端(S3,OSS)
Ingester负责所有的日志行有序
Ingester负责所有的日志行按升序排序,如果收到乱序的日志行,将拒绝并报错
来自每一组唯一标签的日志在内存中被构建为“块”,然后被刷新到备份存储后端。
如果ingester进程奔溃,内存中构建的块的数据未刷写到磁盘,则会丢失
4. Querier
LogQL
首先尝试查询所有Ingester的内存数据,然后再从后端存储加载数据。
5. Chunk Store
块存储是Loki长期数据存储,支持交互式查询和持续写入
包含如下:
1.1 块索引
1.2 块数据本身的键值存储
注意: 块存储不是单独的服务,而是嵌入到需要访问的Loki数据的服务中的库:Querier和Ingester
三、Loki架构图
数据写入:
1.1 Distributor负责接收日志数据,然后拆分为多个块,并行的发送给Ingester
1.2 Ingester接收Distributor发送的数据块,缓存在内存中, 同时定时刷写进持久化存储Chunk Store中
数据查询
Ingester接收Querier查询请求,根据块索引查询指定的块,如果内存中没有,将从持久化存储chunk Store中查找数据,并返回
四、Loki部署
本次部署使用单进程模式进行部署,通过复用“阿果阿郭”老师的部署方式进行单进程部署,仅作为学习复习使用,原文链接:k8s loki 容器日志解决方案-4. alertmanager 报警及loki rules - 哔哩哔哩
官网部署方式有:
请根据需要自行参考学习
部署如下:
4.1 安装supervisor
安装supervisor
yum install epel-release -y
yum install supervisor -y
修改内存、进程、文件限制
sed -i '/forking/a LimitNOFILE=65536' /usr/lib/systemd/system/supervisord.service;
sed -i '/forking/a LimitNPROC=65536' /usr/lib/systemd/system/supervisord.service ;
sed -i '/forking/a LimitMEMLOCK=infinity' /usr/lib/systemd/system/supervisord.service ;启动服务
systemctl start supervisord.service
4.2 安装Loki
上传loki-linux-amd64.zip压缩包到/data/loki目录
解压文件
unzip loki-linux-amd64.zip
验证版本
./loki-linux-amd64 --versionsystemd管理Loki
cat <<EOF > /usr/lib/systemd/system/loki.service
[Unit]
Description=loki.service
After=rc-local.service nss-user-lookup.target[Service]
Type=simple
LimitMEMLOCK=infinity
LimitNPROC=65536
LimitNOFILE=65536
WorkingDirectory=/data/loki
ExecStart=/data/loki/loki-linux-amd64 -log.level=info -target all -config.file=loki-local-config.yaml[Install]
WantedBy=multi-user.target
EOFsupervisord管理Loki
cat <<EOF> /etc/supervisord.d/loki.ini
[program:loki]
command=/data/loki/loki-linux-amd64 -log.level=info -target all -config.file=loki-local-config.yaml
autorestart=true
autostart=true
stderr_logfile=/tmp/loki_err.log
stdout_logfile=/tmp/loki_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/loki
EOF配置Loki文件
cat <<EOF> /data/loki/loki-local-config.yaml
auth_enabled: false #是否启用认证。这里认证是针对多租户而言,这里我们使用单租户server:
http_listen_port: 3100
grpc_server_max_concurrent_streams: 0ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h
max_chunk_age: 1h
chunk_target_size: 10485760
chunk_retain_period: 30s
max_transfer_retries: 0schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
# 存储配置
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache #定义缓存地址
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /data/loki/chunks #定义块地址compactor:
working_directory: /data/loki/boltdb-shipper-compactor #压缩位置
shared_store: filesystem
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 200
# ingestion_burst_size_mb: 400
# max_streams_per_user: 0
# max_chunks_per_query: 20000000
# max_query_parallelism: 140
# max_query_series: 5000
# cardinality_limit: 1000000
# max_streams_matchers_per_query: 10000chunk_store_config:
max_look_back_period: 0s# 数据保留时间
table_manager:
retention_deletes_enabled: true
retention_period: 24hruler:
storage:
type: local
local:
directory: /data/loki/rules
rule_path: /data/loki/rules-temp
alertmanager_url: http://localhost:9093
ring:
kvstore:
store: inmemory
enable_api: trueEOF
启动Loki
supervisorctl status
supervisorctl update
supervisorctl status
4.3 安装Promtail
mkdir /data/promtail/bin,config,logs -p
cd /data/promtail/bin
curl -O -L "https://github.com/grafana/loki/releases/download/v2.3.0/promtail-linux-amd64.zip"
unzip "promtail-linux-amd64.zip"
chmod a+x "promtail-linux-amd64"配置文件
cat << EOF > /data/promtail/config/promtail.conf
server: #promtail服务的server配置
http_listen_address: 0.0.0.0
http_listen_port: 19080
grpc_listen_port: 0positions:
filename: ./logs/loki_positions.yaml
ignore_invalid_yaml: trueclients: #定义Loki服务的地址
- url: http://127.0.0.1:3100/loki/api/v1/pushscrape_configs:
- job_name: service_log
file_sd_configs: #定义抓取的日志,通过文件实现服务发现
- files:
- ./config/*.yaml
refresh_interval: 1m
EOF配置supervisor管理程序
cat << EOF > /etc/supervisord.d/promtail.ini
[program:promtail]
command=/data/promtail/bin/promtail-linux-amd64 -config.expand-env=true -config.file=/data/promtail/config/promtail.conf
autorestart=true
autostart=true
stderr_logfile=/tmp/promtail_err.log
stdout_logfile=/tmp/promtail_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/promtail/
EOF定义收集日志配置
cat << EOF > /data/promtail/config/varlogmessage.yaml
- targets:
- localhost
labels:
__path__: /var/log/messages
env: ENV
hostname: BINDIP
service_name: var-log-messages
log_type: var-log-messages
- targets:
- localhost
labels:
__path__: /var/log/secure
env: ENV
hostname: BINDIP
service_name: var-log-secure
log_type: var-log-secure
EOF注意: env中变量使用的jinja2的语法
ENV=test
BINDIP=192.168.161.118
sed -i "s/ENV/$ENV/g" /data/promtail/config/varlogmessage.yaml
sed -i "s/BINDIP/$BINDIP/g" /data/promtail/config/varlogmessage.yaml
启动promtail
supervisorctl status
supervisorctl update
supervisorctl status验证Loki是否收集到日志
curl 127.0.0.1:3100/loki/api/v1/labels
curl 127.0.0.1:3100/loki/api/v1/label/service_name/values
curl 127.0.0.1:3100/loki/api/v1/label/filename/values
4.4 安装grafana
1.1 下载grafana二进制包
下载地址:wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.14.linux-amd64.tar.gz
建议国内下载:
https://mirrors.huaweicloud.com/grafana/8.5.9/grafana-enterprise-8.5.9.linux-amd64.tar.gz
tar xf grafana-enterprise-8.5.9.linux-amd64.tar.gz -C /data
cd /data/
mv grafana-8.5.9/ grafana配置supervisor管理grafana
cat <<EOF> /etc/supervisord.d/grafana.ini
[program:grafana]
command=/data/grafana/bin/grafana-server web
autorestart=true
autostart=true
stderr_logfile=/tmp/grafana_err.log
stdout_logfile=/tmp/grafana_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/grafana
EOF启动grafana
supervisorctl status
supervisorctl update
supervisorctl status
添加loki数据源
通过 Explore 查看 loki 数据
导入grafana loki dashboard 查看数据
"annotations":
"list": [
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target":
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
,
"type": "dashboard"
]
,
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": 8,
"iteration": 1655978337467,
"links": [],
"panels": [
"aliasColors": ,
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$ENV",
"fill": 1,
"fillGradient": 0,
"gridPos":
"h": 5,
"w": 24,
"x": 0,
"y": 0
,
"hiddenSeries": false,
"id": 4,
"legend":
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
,
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options":
"alertThreshold": true
,
"percentage": false,
"pluginVersion": "8.1.5",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
"expr": "sum (count_over_time(service_name=~\\"$app_name\\",filename=~\\"$log_type\\",hostname=~\\"$hostname\\"[2m] )) by (hostname)",
"hide": true,
"legendFormat": "",
"queryType": "randomWalk",
"refId": "A"
,
"expr": "sum (count_over_time(service_name=~\\"$app_name\\",filename=~\\"$log_type\\",hostname=~\\"$hostname\\"[2m] )) by (hostname,filename)",
"hide": false,
"legendFormat": "hostname/filename",
"refId": "B"
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "日志量统计",
"tooltip":
"shared": true,
"sort": 0,
"value_type": "individual"
,
"type": "graph",
"xaxis":
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
,
"yaxes": [
"$$hashKey": "object:319",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
,
"$$hashKey": "object:320",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
],
"yaxis":
"align": false,
"alignLevel": null
,
"datasource": "$ENV",
"description": "",
"gridPos":
"h": 21,
"w": 24,
"x": 0,
"y": 5
,
"id": 2,
"options":
"dedupStrategy": "exact",
"enableLogDetails": false,
"prettifyLogMessage": false,
"showCommonLabels": false,
"showLabels": false,
"showTime": true,
"sortOrder": "Descending",
"wrapLogMessage": true
,
"pluginVersion": "7.4.3",
"targets": [
"expr": "service_name=~\\"$app_name\\",filename=~\\"$log_type\\",hostname=~\\"$hostname\\" |~ \\"(?i)$log_level\\"",
"maxLines": 1000,
"queryType": "randomWalk",
"refId": "A"
],
"timeFrom": null,
"timeShift": null,
"title": "日志",
"transparent": true,
"type": "logs"
],
"refresh": false,
"schemaVersion": 30,
"style": "dark",
"tags": [],
"templating":
"list": [
"current":
"selected": false,
"text": "crm-cd",
"value": "crm-cd"
,
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "选择环境",
"multi": false,
"name": "ENV",
"options": [],
"query": "loki",
"queryValue": "",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"type": "datasource"
,
"allValue": null,
"current":
"selected": true,
"text": "neo-pharma-service",
"value": "neo-pharma-service"
,
"datasource": "$ENV",
"definition": "label_values(service_name=~\\".+\\",service_name)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "服务名",
"multi": false,
"name": "app_name",
"options": [],
"query": "label_values(service_name=~\\".+\\",service_name)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
,
"allValue": null,
"current":
"selected": false,
"text": "/logs/gc.log",
"value": "/logs/gc.log"
,
"datasource": "$ENV",
"definition": "label_values(service_name=\\"$app_name\\", filename)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "日志名",
"multi": false,
"name": "log_type",
"options": [],
"query": "label_values(service_name=\\"$app_name\\", filename)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"type": "query"
,
"allValue": ".*",
"current":
"selected": true,
"text": "neo-pharma-service-7c87d876d5-js77h",
"value": "neo-pharma-service-7c87d876d5-js77h"
,
"datasource": "$ENV",
"definition": "label_values(service_name=\\"$app_name\\",filename=\\"$log_type\\", hostname)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "主机名",
"multi": false,
"name": "hostname",
"options": [],
"query": "label_values(service_name=\\"$app_name\\",filename=\\"$log_type\\", hostname)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tagsQuery": "",
"type": "query",
"useTags": false
,
"allValue": "(^\\\\\\\\S|^\\\\\\\\s)",
"current":
"selected": false,
"text": "All",
"value": "$__all"
,
"description": "可以直接输入搜索的关键字进行过滤",
"error": null,
"hide": 0,
"includeAll": true,
"label": "关键字过滤",
"multi": false,
"name": "log_level",
"options": [
"selected": true,
"text": "All",
"value": "$__all"
,
"selected": false,
"text": "warning",
"value": "warning"
,
"selected": false,
"text": "unknown",
"value": "unknown"
,
"selected": false,
"text": "info",
"value": "info"
,
"selected": false,
"text": "error",
"value": "error"
,
"selected": false,
"text": "直接输入关键字搜索",
"value": "直接输入关键字搜索"
],
"query": "warning,unknown,info,error,直接输入关键字搜索",
"queryValue": "",
"skipUrlSync": false,
"type": "custom"
]
,
"time":
"from": "now-1h",
"to": "now"
,
"timepicker": ,
"timezone": "",
"title": "日志中心",
"uid": "NlV_8QD7k",
"version": 21
效果图
4.5 安装alertmanager
cd /data
tar xf alertmanager-0.24.0.linux-amd64.tar.gz
mv alertmanager-0.24.0.linux-amd64 alertmanager
配置supervisor管理alertmanager
cat <<EOF> /etc/supervisord.d/alertmanager.ini
[program:alertmanager]
command=/data/alertmanager/alertmanager
autorestart=true
autostart=true
stderr_logfile=/tmp/alertmanager_err.log
stdout_logfile=/tmp/alertmanager_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/alertmanager
EOF配置alertmanager配置文件
cat <<EOF> /data/alertmanager/alertmanager.yml
global:
smtp_smarthost: 'smtp.qq.com:465' # smtp地址
smtp_from: '4506259@qq.com' # 谁发邮件
smtp_auth_username: '4507259@qq.com' # 邮箱用户
smtp_auth_password: 'gbrqbrcace' # 邮箱密码
smtp_require_tls: falsetemplates:
- '/usr/local/alertmanager/template/*.tmpl'route:
group_by: ["instance"] # 分组名
group_wait: 30s # 当收到告警的时候,等待三十秒看是否还有告警,如果有就一起发出去
group_interval: 5m # 发送警告间隔时间
repeat_interval: 3h # 重复报警的间隔时间
receiver: mail # 全局报警组,这个参数是必选的,和下面报警组名要相同receivers:
- name: 'mail' # 报警组名
email_configs:
- to: '187171160@163.com' # 发送给谁
send_resolved: true #
EOF
配置警报规则
cat <<'EOF'> /data/loki/rules/fake/rules.yaml
groups:
- name: service OutOfMemoryError
rules:
# 关键字监控
- alert: loki check words java.lang.OutOfMemoryError
expr: sum by (env, hostname, log_type, filename) (count_over_time(env=~"\\\\w+" |= "java.lang.OutOfMemoryError" [5m]) > 0)
labels:
severity: critical
annotations:
description: '$labels.env $labels.hostname file $labels.filename has $value error'
summary: java.lang.OutOfMemoryError
# java 程序日志性能报警
- alert: loki java full gc count check
expr: sum by (env, hostname, log_type, filename) (count_over_time(env=~"\\\\w+" |= "Full GC (Allocation" [5m]) > 5)
labels:
severity: warning
annotations:
description: '$labels.env $labels.hostname $labels.filename $value '
summary: java full gc count check
# 使用正则表达式报警匹配示例
- alert: dbperform slowlog sql 慢查询
expr: 'sum by (env, hostname, log_type, filename) (count_over_time(env=~"\\\\w+" |~ "time: [1-9]\\\\d4," [5m]) > 5)'
labels:
severity: warning
annotations:
description: '$labels.env $labels.hostname file $labels.filename has $value error'
summary: sql slowlog
EOF
测试警报
echo 'The String object java.lang.OutOfMemoryError is used to represent and manipulate a sequence of characters.' >> /var/log/messages`
五、与EFK比较
EFK:
1.1 Elasticsearch中的数据作为非结构化JSON对象存储在磁盘上。每个对象的键和每个键的内容都被索引。
然后可以使用JSON对象定义查询(称为查询DSL)或通过Lucene查询语言查询数据。
1.2 EFK使用fluentd作为日志收集器
Loki:
1.1 单进程模式将日志数据存储到磁盘中,微服务可扩展模式将数据存储在云存储中。日志通过标记标签,仅只有标签被索引,索引更少,成本更低
1.2 Loki使用promtail作为日志收集器。通过发现存储在磁盘上的日志文件, 并将它们与标签做关联,然后转发给Loki
Promtail可以充当Pod 的sidecar进行Pod的日志收集,以及从指定文件中读取日志、跟踪系统日志
参考文档:
k8s loki 容器日志解决方案-4. alertmanager 报警及loki rules - 哔哩哔哩
Getting started | Grafana Loki documentation
loki日志收集系统部署
loki日志收集系统
loki简介
介绍:Loki 由以下3个部分组成:
loki
是主服务器,负责存储日志和处理查询。promtail
是代理,负责收集日志并将其发送给 loki 。Grafana
用于 UI 展示。
一、promtail 部署
下载地址:https://github.com/grafana/loki/releases/download/v2.4.1/promtail-linux-amd64.zip
1、解压,下载官方配置模板并修改
mkdir /data/promtail
unzip promtail-linux-amd64.zip
wget https://raw.githubusercontent.com/grafana/loki/main/clients/cmd/promtail/promtail-local-config.yaml
vim promtail-local-config.yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://192.156.71.125:3100/loki/api/v1/push ## 此处lokiserver修改为服务器端地址
scrape_configs:
- job_name: application # job名称,自定义
static_configs:
- targets: # 如测试环境多个应用多个路径,从此行开始复制修改对应的标签
- localhost
labels:
job: tomcat # 监控类型
project: tjhlwjg # 项目名称自定义
host: 192.156.71.125 # 建议修改为本机ip,方便过滤
__path__: /data/tomcat_tjjg/logs/catalina.out # tomcat日志路径
2、启动promtail
cd /data/promtail
nohup ./promtail-linux-amd64 --config.file=promtail.yaml &
二、loki 部署
下载地址:https://github.com/grafana/loki/releases/download/v2.4.1/loki-linux-amd64.zip
1、解压,下载官方配置模板并修改
mkdir /data/loki
unzip loki-linux-amd64
wget https://raw.githubusercontent.com/grafana/loki/master/cmd/loki/loki-local-config.yaml
vim loki-local-config.yaml # 这里不需要alertmanager,注释掉
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
# ruler:
# alertmanager_url: http://localhost:9093
# 下面的配置为新增的,不配置日志太大会报错
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 30 #修改每用户摄入速率限制,即每秒样本量,默认值为4M
ingestion_burst_size_mb: 15 #修改每用户摄入速率限制,即每秒样本量,默认值为6M
2、启动loki
cd /data/loki
nohup ./loki-linux-amd64 --config.file=loki.yaml &
三、配置grafana
1、添加loki
数据源
2、输入loki
服务器的ip和端口(3100)、其他默认,然后Save&Test
3、在Explore
中选择loki
,可以根据自定义的标签进行过滤
4、loki
的日志页面如下
以上是关于Loki日志收集单进程模式部署的主要内容,如果未能解决你的问题,请参考以下文章