Prometheus部署及服务发现
Posted 还行少年
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Prometheus部署及服务发现相关的知识,希望对你有一定的参考价值。
文章目录
一、Prometheus部署
Prometheus 192.168.30.7
server1 192.168.30.8
1.基础环境配置(两个主机都要做)
[root@prometheus ~]# systemctl stop firewalld //关闭防火墙
[root@prometheus ~]# systemctl disable firewalld //防火墙开机不自启
[root@prometheus ~]# getenforce //关闭核心防护
Disabled
[root@prometheus ~]# ntpdate ntp1.aliyun.com //时间同步
23 Sep 13:58:08 ntpdate[20882]: step time server 120.25.115.20 offset 1.121376 sec
[root@prometheus ~]#
2.安装启动Prometheus
[root@prometheus ~]# cd /usr/local/prometheus-2.27.1.linux-amd64/
[root@prometheus prometheus-2.27.1.linux-amd64]# ./prometheus //直接启动服务
level=info ts=2021-09-23T06:01:26.613Z caller=main.go:388 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2021-09-23T06:01:26.613Z caller=main.go:426 msg="Starting Prometheus" version="(version=2.27.1, branch=HEAD, revision=db7f0bcec27bd8aeebad6b08ac849516efa9ae02)"
level=info ts=2021-09-23T06:01:26.613Z caller=main.go:431 build_context="(go=go1.16.4, user=root@fd804fbd4f25, date=20210518-14:17:54)"
level=info ts=2021-09-23T06:01:26.613Z caller=main.go:432 host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 prometheus (none))"
3.观察服务状态
[root@prometheus ~]# netstat -antp | grep 9090 //另起终端,观察服务
tcp6 0 0 :::9090 :::* LISTEN 21348/./prometheus
tcp6 0 0 ::1:46432 ::1:9090 ESTABLISHED 21348/./prometheus
tcp6 0 0 ::1:9090 ::1:46432 ESTABLISHED 21348/./prometheus
[root@prometheus ~]#
4.查看默认配置文件
[root@prometheus prometheus-2.27.1.linux-amd64]# cat prometheus.yml
# my global config
global: //全局组件
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus' //对于指标需要打上的标签,对于PrometheusSQL(查询语句)的标签
# metrics_path defaults to '/metrics' //收集数据的路径
# scheme defaults to 'http'.
static_configs: //对于Prometheus的静态配置监听端口具体数据收集的位置 默认的端口9090
- targets: ['localhost:9090']
5.访问web页面
表达式浏览器:http://192.168.30.7:9090
查看采集的数据: http://192.168.30.7/metrics
6.部署其他被监控节点
[root@server1 ~]# systemctl stop firewalld //关闭防火墙
[root@server1 ~]# systemctl disable firewalld //防火墙开机不自启
[root@server1 ~]# getenforce //关闭核心防护
Disabled
[root@server1 ~]# ntpdate ntp1.aliyun.com //时间同步
23 Sep 13:58:19 ntpdate[19569]: adjust time server 120.25.115.20 offset 0.001137 sec
[root@server1 ~]# tar xf node_exporter-1.1.2.linux-amd64.tar.gz -C /usr/local/ //解压安装包,可在官网下载
[root@server1 ~]# cd /usr/local/node_exporter-1.1.2.linux-amd64/
[root@server1 node_exporter-1.1.2.linux-amd64]# cp node_exporter /usr/local/bin/ //让系统可以识别
[root@server1 node_exporter-1.1.2.linux-amd64]#
6.1 启动服务
[root@server1 node_exporter-1.1.2.linux-amd64]# ./node_exporter //多种启动方式,直接选择直接启动
level=info ts=2021-09-23T06:19:38.800Z caller=node_exporter.go:178 msg="Starting node_exporter" version="(version=1.1.2, branch=HEAD, revision=b597c1244d7bef49e6f3359c87a56dd7707f6719)"
level=info ts=2021-09-23T06:19:38.800Z caller=node_exporter.go:179 msg="Build context" build_context="(go=go1.15.8, user=root@f07de8ca602a, date=20210305-09:29:10)"
level=warn ts=2021-09-23T06:19:38.801Z caller=node_exporter.go:181 msg="Node Exporter is running as root user. This exporter is designed to run as unpriviledged user, root is not required."
level=info ts=2021-09-23T06:19:38.801Z caller=filesystem_common.go:74 collector=filesystem msg="Parsed flag --collector.filesystem.ignored-mount-points" flag=^/(dev|proc|sys|var/lib/docker/.+)($|/)
level=info ts=2021-09-23T06:19:38.801Z caller=filesystem_common.go:76 collector=filesystem msg="Parsed flag --collector.filesystem.ignored-fs-types" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
level=info ts=2021-09-23T06:19:38.801Z caller=node_exporter.go:106 msg="Enabled collectors"
level=info ts=2021-09-23T06:19:38.801Z caller=node_exporter.go:113
6.2 修改Prometheus配置文件
[root@prometheus prometheus-2.27.1.linux-amd64]# tail prometheus.yml
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'nodes' //添加静态targets才能使得server1节点加入加
static_configs:
- targets: ['192.168.30.8:9100']
[root@prometheus prometheus-2.27.1.linux-amd64]#
6.3 重启Prometheus,查看服务
二、服务发现
1.静态配置发现
[root@prometheus prometheus-2.27.1.linux-amd64]# tail prometheus.yml
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'nodes'
static_configs:
- targets:
- 192.168.30.8:9100
- 192.168.30.9:9100 //新增192.168.30.9
2.动态发现
2.1 基于文件服务发现
2.1.1 编写prometheus.yml文件
[root@prometheus file_sd]# tail -n 15 prometheus.yml //只列出和静态文件不同的地方
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
file_sd_configs:
- files:
- targets/prometheus_*.yml
refresh_interval: 2m
# All nodes
- job_name: 'nodes'
file_sd_configs:
- files:
- targets/nodes_*.yml
refresh_interval: 2m
[root@prometheus file_sd]#
2.1.2 编写prometheus.yml文件指定的targets文件
[root@prometheus file_sd]# cd targets/
[root@prometheus targets]# vim prometheus_server.yml
[root@prometheus targets]# vim ndoes_centos.yml
[root@prometheus targets]# cat prometheus_server.yml
- targets:
- 192.168.30.7:9090
labels:
app: prometheus
job: prometheus
[root@prometheus targets]# cat ndoes_centos.yml
- targets:
- 192.168.30.8:9100
- 192.168.30.9:9100
labels:
app: node-exporter
job: node
[root@prometheus targets]#
2.1.3 启动服务
[root@prometheus prometheus-2.27.1.linux-amd64]# ./prometheus --config.file=./file_sd/prometheus.yml
level=info ts=2021-09-23T07:08:11.028Z caller=main.go:388 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2021-09-23T07:08:11.028Z caller=main.go:426 msg="Starting Prometheus" version="(version=2.27.1, branch=HEAD, revision=db7f0bcec27bd8aeebad6b08ac849516efa9ae02)"
level=info ts=2021-09-23T07:08:11.028Z caller=main.go:431 build_context="(go=go1.16.4, user=root@fd804fbd4f25, date=20210518-14:17:54)"
level=info ts=2021-09-23T07:08:11.028Z caller=main.go:432 host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 prometheus (none))"
2.2 基于DNS自动发现 (仅作了解)
基于DNS的服务发现针对一组DNS域名进行定期查询,以发现待监控的目标,查询时使用的DNS服务器的客户端/etc/resolv.conf文件指定;
该发现机制依赖于A、AAAA和SRV资源记录,且仅支持该类方法,
尚不支持RFC6763中的高级DNS发现方式
PS:
- SRV: SRV记录的作用是指明某域名下提供的服务。
- 实例:
_http._tcp.example.com.SRV 10 5 80. www.example.comSRV后面项目的含义:
10 -优先级,类似MX记录
5 -权重
80-端口
www.example.com -实际提供服务的主机名。同时SRV可以指定在端口上对应哪个service
prometheus 基于DNS的服务中的SRV记录,让prometheus发现指定target上对应的端口对应的是exporter或instrumentation
pull 形式 以http/https方式,拉取的对应被监控端的指标数据
2.3 基于consul发现
2.3.1 安装运行consul
[root@prometheus ~]# unzip consul_1.9.0_linux_amd64.zip -d /usr/local/bin/
Archive: consul_1.9.0_linux_amd64.zip
inflating: /usr/local/bin/consul
[root@prometheus ~]# mkdir -pv /consul/data/
mkdir: 已创建目录 "/consul"
mkdir: 已创建目录 "/consul/data/"
[root@prometheus ~]# mkdir /etc/consul
[root@prometheus ~]# cd /etc/consul/
[root@prometheus consul]# consul agent -dev -ui -data-dir=/consul/data/ \\
> -config-dir=/etc/consul/ -client=0.0.0.0
==> Starting Consul agent...
Version: '1.9.0'
Node ID: '87f6c071-7f67-d561-bec6-00a31c863911'
Node name: 'prometheus'
Datacenter: 'dc1' (Segment: '<all>')
Server: true (Bootstrap: false)
Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
2.3.2 编辑/etc/consul目录下的prometheus-servers.json配置文件
[root@prometheus ~]# cat /etc/consul/prometheus-servers.json
{
"services": [
{
"id": "prometheus-server-node01",
"name": "prom-server-node01",
"address": "192.168.30.7",
"port": 9090,
"tags": ["prometheus"],
"checks": [{
"http": "http://192.168.30.7:9090/metrics",
"interval": "5s"
}]
}
]
}
[root@prometheus ~]# consul reload
Configuration reload triggered
2.3.3 启动Prometheus
[root@prometheus ~]# cd /usr/local/prometheus-2.27.1.linux-amd64/
[root@prometheus prometheus-2.27.1.linux-amd64]# ./prometheus
level=info ts=2021-09-23T12:51:55.014Z caller=main.go:388 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2021-09-23T12:51:55.014Z caller=main.go:426 msg="Starting Prometheus" version="(version=2.27.1, branch=HEAD, revision=db7f0bcec27bd8aeebad6b08ac849516efa9ae02)"
level=info ts=2021-09-23T12:51:55.014Z caller=main.go:431 build_context="(go=go1.16.4, user=root@fd804fbd4f25, date=20210518-14:17:54)"
level=info ts=2021-09-23T12:51:55.014Z caller=main.go:432 host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 prometheus (none))"
2.3.4 浏览器访问
2.3.5 创建consul自动发现的prometheus.yml文件
[root@prometheus prometheus-2.27.1.linux-amd64]# cd cousul-sd/
[root@prometheus cousul-sd]# cat ../prometheus.yml > prometheus.yml
[root@prometheus cousul-sd]# vim prometheus.yml
[root@prometheus cousul-sd]# tail -n 17 prometheus.yml
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
consul_sd__configs:
- server: "192.168.30.7:8500"
tags:
- "prometheus"
refresh_interval: 2m
- job_name: 'nodes'
consul_sd_configs:
- server: "192.168.30.7:8500"
tags:
- "nodes"
refresh_interval: 2m
2.3.6 注册其他节点(降低实验难度,直接定义文件)
[root@prometheus consul]# cat nodes.json
{
"services": [
{
"id": "node_exporter-node01",
"name": "node01",
"address": "192.168.30.8",
"port": 9100,
"tags": ["nodes"],
"checks": [{
"http": "http://192.168.30.8:9100/metrics",
"interval": "5s"
}]
},
{
"id": "node_exporter-node02",
"name": "node02",
"address": "192.168.30.9",
"port": 9100,
"tags": ["nodes"],
"checks": [{
"http": "http://192.168.30.8:9100/metrics",
"interval": "5s"
}]
}
]
}
[root@prometheus consul]# consul reload
Configuration reload triggered
2.3.7 启动所有服务
2.4 基于K8S API的服务发现
prom 基于k8s api的服务发现机制,支持将API server中node、service、endpoint、pod和ingress等资源类型下相应的各资源对象视为target,并持续监视相关资源变化情况(K8S 的api server可自动发现及自动添加)
其中
-
① node、service、endpoint、pod和ingress资源分别由各自的发现机制进行定义,以node为例,pro监控node可以直接在node节点上部署exporter,也可以直接将kubectl视为监控节点的入口之一
kubectl 内置了cadvisor(容器监控工具)
cadvisor :用来分析运行中的容器的资源占用及其性能特性的工具,同时提供基础查询界面和http接口 -
② 负责发现每种类型资源对象的组件,在pro中称为一个“role"(role 为每个资源类型独有的一种自动发现机制)
-
③ 同时支持在集群上基于daemonset控制器部署node-exporter后发现个节点
以上是关于Prometheus部署及服务发现的主要内容,如果未能解决你的问题,请参考以下文章