72-云原生监控-Prometheus实现Docker监控
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了72-云原生监控-Prometheus实现Docker监控相关的知识,希望对你有一定的参考价值。
Prometheus 架构
https://github.com/prometheus/prometheus
cAdvisor 简介
cadvisor(Container Advisor) 是 Google 开源的一个容器监控工具,它以守护进程方式运行,用于收集、聚合、处理和导出正在运行容器的有关信息。具体来说,该组件对每个容器都会记录其资源隔离参数、历史资源使用情况、完整历史资源使用情况的直方图和网络统计信息。它不仅可以搜集一台机器上所有运行的容器信息,还提供基础查询界面和http接口,方便其他组件如Prometheus进行数据抓取
https://github.com/kubernetes/kubernetes/pull/65707
案例:实现Prometheus对容器监控并用Grafana展示
- Prometheus 二进制编译部署
#下载二进制包并解压
https://github.com/prometheus/prometheus/releases
[root@ubuntu2204 ~]#cat install_prometheus.sh
#!/bin/bash
#
PROMETHEUS_VERSION=2.42.0
PROMETHEUS_FILE="prometheus-$PROMETHEUS_VERSION.linux-amd64.tar.gz"
PROMETHEUS_URL="https://github.com/prometheus/prometheus/releases/download/v$PROMETHEUS_VERSION/$PROMETHEUS_FILE"
INSTALL_DIR=/usr/local
HOST=`hostname -I|awk print $1`
. /etc/os-release
msg_error()
echo -e "\\033[1;31m$1\\033[0m"
msg_info()
echo -e "\\033[1;32m$1\\033[0m"
msg_warn()
echo -e "\\033[1;33m$1\\033[0m"
color ()
RES_COL=60
MOVE_TO_COL="echo -en \\\\033[$RES_COLG"
SETCOLOR_SUCCESS="echo -en \\\\033[1;32m"
SETCOLOR_FAILURE="echo -en \\\\033[1;31m"
SETCOLOR_WARNING="echo -en \\\\033[1;33m"
SETCOLOR_NORMAL="echo -en \\E[0m"
echo -n "$1" && $MOVE_TO_COL
echo -n "["
if [ $2 = "success" -o $2 = "0" ] ;then
$SETCOLOR_SUCCESS
echo -n $" OK "
elif [ $2 = "failure" -o $2 = "1" ] ;then
$SETCOLOR_FAILURE
echo -n $"FAILED"
else
$SETCOLOR_WARNING
echo -n $"WARNING"
fi
$SETCOLOR_NORMAL
echo -n "]"
echo
install_prometheus ()
if [ ! -f $PROMETHEUS_FILE ] ;then
wget $PROMETHEUS_URL || color "下载失败!" 1 ; exit ;
fi
[ -d $INSTALL_DIR ] || mkdir -p $INSTALL_DIR
tar xf $PROMETHEUS_FILE -C $INSTALL_DIR
cd $INSTALL_DIR && ln -s prometheus-$PROMETHEUS_VERSION.linux-amd64 prometheus
mkdir -p $INSTALL_DIR/prometheus/bin,conf,data
cd $INSTALL_DIR/prometheus && mv prometheus promtool bin/ ; mv prometheus.yml conf/;
groupadd -r prometheus
useradd -r -g prometheus -s /sbin/nologin prometheus
chown -R prometheus.prometheus $INSTALL_DIR/prometheus/
cat > /etc/profile.d/prometheus.sh <<EOF
export PROMETHEUS_HOME=$INSTALL_DIR/prometheus
export PATH=\\$PROMETHEUS_HOME/bin:\\$PATH
EOF
prometheus_service ()
cat > /lib/systemd/system/prometheus.service <<EOF
[Unit]
Descriptinotallow=Prometheus Server
Documentatinotallow=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
Restart=on-failure
User=prometheus
Group=prometheus
WorkingDirectory=$INSTALL_DIR/prometheus
ExecStart=$INSTALL_DIR/prometheus/bin/prometheus --config.file=$INSTALL_DIR/prometheus/conf/prometheus.yml --web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
LimitNOFILE=65535
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now prometheus.service
start_prometheus()
systemctl is-active prometheus
if [ $? -eq 0 ];then
echo
color "Prometheus 安装完成!" 0
echo "-------------------------------------------------------------------"
echo -e "访问链接: \\c"
msg_info "http://$HOST:9090/"
else
color "Prometheus 安装失败!" 1
exit
fi
install_prometheus
prometheus_service
start_prometheus
[root@ubuntu2204 ~]#bash install_prometheus.sh
Created symlink /etc/systemd/system/multi-user.target.wants/prometheus.service → /lib/systemd/system/prometheus.service.
active
Prometheus 安装完成! [ OK ]
-------------------------------------------------------------------
访问链接: http://192.168.11.200:9090/
[root@ubuntu2204 ~]#cat /lib/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
Restart=on-failure
User=prometheus
Group=prometheus
WorkingDirectory=/usr/local/prometheus
ExecStart=/usr/local/prometheus/bin/prometheus --config.file=/usr/local/prometheus/conf/prometheus.yml --web.enable-lifecycle
ExecReload=/bin/kill -HUP
LimitNOFILE=65535
[Install]
WantedBy=multi-user.target
[root@ubuntu2204 ~]#ll /usr/local/prometheus
lrwxrwxrwx 1 root root 29 Mar 5 06:36 /usr/local/prometheus -> prometheus-2.42.0.linux-amd64/
[root@ubuntu2204 ~]#ll /usr/local/prometheus/
total 44
drwxr-xr-x 7 prometheus prometheus 4096 Mar 5 06:36 ./
drwxr-xr-x 11 root root 4096 Mar 5 06:36 ../
-rw-r--r-- 1 prometheus prometheus 11357 Feb 1 08:23 LICENSE
-rw-r--r-- 1 prometheus prometheus 3773 Feb 1 08:23 NOTICE
drwxr-xr-x 2 prometheus prometheus 4096 Mar 5 06:36 bin/
drwxr-xr-x 2 prometheus prometheus 4096 Mar 5 06:36 conf/
drwxr-xr-x 2 prometheus prometheus 4096 Feb 1 08:23 console_libraries/
drwxr-xr-x 2 prometheus prometheus 4096 Feb 1 08:23 consoles/
drwxr-xr-x 4 prometheus prometheus 4096 Mar 5 06:36 data/
[root@ubuntu2204 ~]#tree -h /usr/local/prometheus
[ 29] /usr/local/prometheus
├── [ 11K] LICENSE
├── [3.7K] NOTICE
├── [4.0K] bin
│ ├── [113M] prometheus
│ └── [105M] promtool
├── [4.0K] conf
│ └── [ 934] prometheus.yml
├── [4.0K] console_libraries
│ ├── [2.8K] menu.lib
│ └── [6.0K] prom.lib
├── [4.0K] consoles
│ ├── [ 616] index.html.example
│ ├── [2.6K] node-cpu.html
│ ├── [3.4K] node-disk.html
│ ├── [5.6K] node-overview.html
│ ├── [1.4K] node.html
│ ├── [4.0K] prometheus-overview.html
│ └── [1.3K] prometheus.html
└── [4.0K] data
├── [4.0K] chunks_head
├── [ 0] lock
├── [ 20K] queries.active
└── [4.0K] wal
└── [ 41K] 00000000
7 directories, 17 files
#默认配置文件内容
[root@ubuntu2204 ~]#cat /usr/local/prometheus/conf/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global evaluation_interval.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here its Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to /metrics
# scheme defaults to http.
static_configs:
- targets: ["localhost:9090"]
[root@ubuntu2204 ~]#vim /usr/local/prometheus/conf/prometheus.yml
[root@ubuntu2204 ~]#cat /usr/local/prometheus/conf/prometheus.yml |grep targets
...
- targets: ["192.168.11.200:9090"]
[root@ubuntu2204 ~]#curl -X POST http://prometheus.mooreyxia.org:9090/-/reload
- 验证Prometheus部署
[root@ubuntu2204 ~]#systemctl status prometheus
● prometheus.service - Prometheus Server
Loaded: loaded (/lib/systemd/system/prometheus.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2023-03-05 06:36:08 UTC; 5min ago
Docs: https://prometheus.io/docs/introduction/overview/
Main PID: 901 (prometheus)
Tasks: 7 (limit: 4575)
Memory: 20.3M
CPU: 355ms
CGroup: /system.slice/prometheus.service
└─901 /usr/local/prometheus/bin/prometheus --config.file=/usr/local/prometheus/conf/prometheus.yml --web.enable-lifecycle
...
[root@ubuntu2204 ~]#ss -nltp|grep prometheus
LISTEN 0 4096 *:9090 *:* users:(("prometheus",pid=901,fd=7))
- 在需要被监控docker的主机准备 docker 环境
[root@web01 ~]#apt -y install docker.io
[root@web01 ~]#echo "registry-mirrors": ["https://registry.docker-cn.com", "http://hub-mirror.c.163.com", "https://docker.mirrors.ustc.edu.cn"] > /etc/docker/daemon.json[root@ubuntu2204 ~]#cat /etc/docker/daemon.json
"registry-mirrors": ["https://registry.docker-cn.com", "http://hub-mirror.c.163.com", "https://docker.mirrors.ustc.edu.cn"]
[root@web01 ~]#systemctl restart docker
[root@web01 ~]#docker info
Client:
Context: default
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 20.10.12
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version:
runc version:
init version:
Security Options:
apparmor
seccomp
Profile: default
cgroupns
Kernel Version: 5.15.0-60-generic
Operating System: Ubuntu 22.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.929GiB
Name: web01
ID: LIVI:HG27:7SOS:66L3:HAV7:7BYT:NVYF:ANWI:CATO:GSRC:Q32W:GXHL
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
https://registry.docker-cn.com/
http://hub-mirror.c.163.com/
https://docker.mirrors.ustc.edu.cn/
Live Restore Enabled: false
- 源码编译安装 cAdvisor(如果是docker内安装的话请看下一步)
https://github.com/google/cadvisor/blob/master/docs/development/build.md
#编译安装cadvisor
#安装go环境,必须在 1.14+ 版本,注意:不要使用1.18以上版
wget https://studygolang.com/dl/golang/go1.17.6.linux-amd64.tar.gz
tar -xf go1.17.6.linux-amd64.tar.gz -C /usr/local/
echo export GOROOT=/usr/local/go > /etc/profile.d/go.sh
echo export PATH=$PATH:$GOROOT/bin >> /etc/profile.d/go.sh
source /etc/profile.d/go.sh
go version
#获取源代码
wget https://github.com/google/cadvisor/archive/refs/tags/v0.39.3.tar.gz
tar xf v0.39.3.tar.gz -C /usr/local/
ln -s /usr/local/cadvisor-0.39.3 /usr/local/cadvisor
cd /usr/local/cadvisor
#获取软件依赖
go get -d github.com/google/cadvisor
go env -w GOPROXY=https://goproxy.cn
#查看go环境变量
go env
#Ubuntu系统
apt -y install gcc make libpfm4 libpfm4-dev jq
#rhel系列
yum -y install gcc make
#编译安装cadvisor
make build
#确认效果
./cadvisor --help
Usage of ./cadvisor:
-add_dir_header
If true, adds the file directory to the header of the log messages
-allow_dynamic_housekeeping
Whether to allow the housekeeping interval to be dynamic (default true)
-alsologtostderr
log to standard error as well as files
-application_metrics_count_limit int
Max number of application metrics to store (per container) (default 100)
.....
#启动cadvisor
./cadvisor -port=8080 &>>/var/log/cadvisor.log &
#浏览器访问 server:8080,可以查看cadvisor的默认ui页面的浏览器效果
- Docker 方式安装 cAdvisor 环境
#由于市场竞争原因Docker官方没有最新版的cAdvisor,若要使用最新版去google官方下载
#配置Docker代理
[root@web01 ~]#cat set_docker_proxy.sh
#!/bin/bash
#
PROXY_SERVER_IP=代理ip
PROXY_PORT=端口
color ()
RES_COL=60
MOVE_TO_COL="echo -en \\\\033[$RES_COLG"
SETCOLOR_SUCCESS="echo -en \\\\033[1;32m"
SETCOLOR_FAILURE="echo -en \\\\033[1;31m"
SETCOLOR_WARNING="echo -en \\\\033[1;33m"
SETCOLOR_NORMAL="echo -en \\E[0m"
echo -n "$1" && $MOVE_TO_COL
echo -n "["
if [ $2 = "success" -o $2 = "0" ] ;then
$SETCOLOR_SUCCESS
echo -n $" OK "
elif [ $2 = "failure" -o $2 = "1" ] ;then
$SETCOLOR_FAILURE
echo -n $"FAILED"
else
$SETCOLOR_WARNING
echo -n $"WARNING"
fi
$SETCOLOR_NORMAL
echo -n "]"
echo
start ()
[ -d /etc/systemd/system/docker.service.d ] || mkdir -p /etc/systemd/system/docker.service.d
cat >> /etc/systemd/system/docker.service.d/http-proxy.conf <<EOF
[Service]
Envirnotallow="HTTP_PROXY=http://$PROXY_SERVER_IP:$PROXY_PORT/"
Envirnotallow="HTTPS_PROXY=http://$PROXY_SERVER_IP:$PROXY_PORT/"
Envirnotallow="NO_PROXY=127.0.0.0/8,172.17.0.0/16,10.0.0.0/24,10.244.0.0/16,192.168.0.0/16,mooreyxia.org,cluster.local"
EOF
systemctl daemon-reload
systemctl restart docker.service
systemctl is-active docker.service &> /dev/null
if [ $? -eq 0 ] ;then
color "Docker 服务代理配置完成!" 0
else
color "Docker 服务代理配置失败!" 1
exit 1
fi
stop ()
rm -f /etc/systemd/system/docker.service.d/http-proxy.conf
systemctl daemon-reload
systemctl restart docker.service
systemctl is-active docker.service &> /dev/null
if [ $? -eq 0 ] ;then
color "Docker 服务代理取消完成!" 0
else
color "Docker 服务代理取消失败!" 1
exit 1
fi
usage ()
echo "Usage: $(basename $0) start|stop"
exit 1
case $1 in
start)
start
;;
stop)
stop
;;
*)
usage
;;
esac
[root@web01 ~]#bash set_docker_proxy.sh start
Docker 服务代理配置完成! [ OK ]
#拉取镜像
[root@web01 ~]#docker pull gcr.io/cadvisor/cadvisor:v0.47.0
v0.47.0: Pulling from cadvisor/cadvisor
ca7dd9ec2225: Pull complete
436581a8b0d5: Pull complete
a9dc403e7252: Pull complete
34fd9f2502b6: Pull complete
f4de17785cac: Pull complete
Digest: sha256:adc29827d88730174181e9fe221938323baa6ba8c5734c2ec52aa2e86a0c303e
Status: Downloaded newer image for gcr.io/cadvisor/cadvisor:v0.47.0
gcr.io/cadvisor/cadvisor:v0.47.0
[root@web01 ~]#docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/cadvisor/cadvisor v0.47.0 b2a3c8cd6153 8 weeks ago 87.3MB
#启动容器
[root@web01 ~]#VERSION=v0.47.0
[root@web01 ~]#docker run \\
> --volume=/:/rootfs:ro \\
> --volume=/var/run:/var/run:ro \\
> --volume=/sys:/sys:ro \\
> --volume=/var/lib/docker/:/var/lib/docker:ro \\
> --volume=/dev/disk/:/dev/disk:ro \\
> --publish=8080:8080 \\
> --detach=true \\
> --name=cadvisor \\
> --privileged \\
> --device=/dev/kmsg \\
> gcr.io/cadvisor/cadvisor:$VERSION
fbe246d9e87cf8fa3884aef7082b074aa738f43aed991e9db38c332a032e3e92
[root@web01 ~]#docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fbe246d9e87c gcr.io/cadvisor/cadvisor:v0.47.0 "/usr/bin/cadvisor -…" 52 seconds ago Up 50 seconds (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp cadvisor
#测试
[root@web01 ~]#docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f84735835457 904b8cb13b93 "/docker-entrypoint.…" 6 minutes ago Up 6 minutes 80/tcp inspiring_ellis
fbe246d9e87c gcr.io/cadvisor/cadvisor:v0.47.0 "/usr/bin/cadvisor -…" 25 minutes ago Up 3 minutes (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp cadvisor
- 配置Prometheus做监控项
#修改prometheus的配置文件,让它自动过滤文件中的节点信息
[root@ubuntu2204 ~]#cat /usr/local/prometheus/conf/prometheus.yml
...
- job_name: "metrics_from_docker"
static_configs:
- targets: #添加安装cAdvisor的主机
- "192.168.11.201:8080"
[root@ubuntu2204 ~]#systemctl restart prometheus
稍等几秒钟,到浏览器中查看监控目标
可以看到自动生成监控指标(指标采集生产一般都是自定义)
简单查询一个指标(Prometheus时序数据库查询语句)
- 安装 Grafana
包下载链接:
https://grafana.com/grafana/download
https://mirrors.tuna.tsinghua.edu.cn/grafana/
#Ubuntu安装
[root@ubuntu2204 ~]#sudo apt-get install -y adduser libfontconfig1
正在读取软件包列表... 完成
正在分析软件包的依赖关系树... 完成
正在读取状态信息... 完成
adduser 已经是最新版 (3.118ubuntu5)。
adduser 已设置为手动安装。
libfontconfig1 已经是最新版 (2.13.1-4.2ubuntu5)。
libfontconfig1 已设置为手动安装。
升级了 0 个软件包,新安装了 0 个软件包, 要卸载 0 个软件包,有 46 个软件包未被升级。
[root@ubuntu2204 ~]#wget https://mirrors.tuna.tsinghua.edu.cn/grafana/apt/pool/main/g/grafana/grafana_9.3.2_amd64.deb
--2022-12-28 21:19:15-- https://mirrors.tuna.tsinghua.edu.cn/grafana/apt/pool/main/g/grafana/grafana_9.3.2_amd64.deb
正在解析主机 mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)... 101.6.15.130, 2402:f000:1:400::2
正在连接 mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)|101.6.15.130|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 89172340 (85M) [application/octet-stream]
正在保存至: ‘grafana_9.3.2_amd64.deb’
grafana_9.3.2_amd64.deb 100%[===================================================================================================================================>] 85.04M 5.38MB/s 用时 20s
2022-12-28 21:19:36 (4.30 MB/s) - 已保存 ‘grafana_9.3.2_amd64.deb’ [89172340/89172340])
[root@ubuntu2204 ~]#sudo dpkg -i grafana_9.3.2_amd64.deb
正在选中未选择的软件包 grafana。
(正在读取数据库 ... 系统当前共安装有 106640 个文件和目录。)
准备解压 grafana_9.3.2_amd64.deb ...
正在解压 grafana (9.3.2) ...
正在设置 grafana (9.3.2) ...
正在添加系统用户"grafana" (UID 112)...
正在将新用户"grafana" (UID 112)添加到组"grafana"...
...
开启 Grafana 服务
[root@ubuntu2204 ~]#systemctl enable --now grafana-server.service
Synchronizing state of grafana-server.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable grafana-server
Created symlink /etc/systemd/system/multi-user.target.wants/grafana-server.service → /lib/systemd/system/grafana-server.service.
[root@ubuntu2204 ~]#ss -ntlp|grep 3000
LISTEN 0 4096 *:3000 *:* users:(("grafana-server",pid=2127,fd=12))
首次登录 Grafana 的 Web 界面
http://grafana-server:3000/
*使用默认用户名和密码都是admin登录
- 导入模板前需要配置数据源
添加数据源: 点击 "Add your first data source"
在Settings界面对Prometheus进行配置
- 导入grafana的镜像模板文件
#导入指定模板
https://grafana.com/grafana/dashboards/193或14282
选择数据源
导入后可以查看详细信息
我是moore,大家一起加油!!!
以上是关于72-云原生监控-Prometheus实现Docker监控的主要内容,如果未能解决你的问题,请参考以下文章
超级实用,解密云原生监控技术,使用prometheus轻松搞定redis监控