Prometheus部署及服务发现

Posted 还行少年

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Prometheus部署及服务发现相关的知识,希望对你有一定的参考价值。

一、Prometheus部署

Prometheus    192.168.30.7
server1       192.168.30.8

1.基础环境配置(两个主机都要做)

[root@prometheus ~]# systemctl stop firewalld       //关闭防火墙
[root@prometheus ~]# systemctl disable firewalld    //防火墙开机不自启
[root@prometheus ~]# getenforce    //关闭核心防护
Disabled
[root@prometheus ~]# ntpdate ntp1.aliyun.com    //时间同步
23 Sep 13:58:08 ntpdate[20882]: step time server 120.25.115.20 offset 1.121376 sec
[root@prometheus ~]# 

2.安装启动Prometheus


[root@prometheus ~]# cd /usr/local/prometheus-2.27.1.linux-amd64/
[root@prometheus prometheus-2.27.1.linux-amd64]# ./prometheus //直接启动服务
level=info ts=2021-09-23T06:01:26.613Z caller=main.go:388 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2021-09-23T06:01:26.613Z caller=main.go:426 msg="Starting Prometheus" version="(version=2.27.1, branch=HEAD, revision=db7f0bcec27bd8aeebad6b08ac849516efa9ae02)"
level=info ts=2021-09-23T06:01:26.613Z caller=main.go:431 build_context="(go=go1.16.4, user=root@fd804fbd4f25, date=20210518-14:17:54)"
level=info ts=2021-09-23T06:01:26.613Z caller=main.go:432 host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 prometheus (none))"

3.观察服务状态

[root@prometheus ~]# netstat -antp | grep 9090  //另起终端,观察服务
tcp6       0      0 :::9090                 :::*                    LISTEN      21348/./prometheus  
tcp6       0      0 ::1:46432               ::1:9090                ESTABLISHED 21348/./prometheus  
tcp6       0      0 ::1:9090                ::1:46432               ESTABLISHED 21348/./prometheus  
[root@prometheus ~]# 

4.查看默认配置文件

[root@prometheus prometheus-2.27.1.linux-amd64]# cat prometheus.yml 
# my global config
global:         //全局组件
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'             //对于指标需要打上的标签,对于PrometheusSQL(查询语句)的标签

    # metrics_path defaults to '/metrics'      //收集数据的路径   
    # scheme defaults to 'http'.

    static_configs:       //对于Prometheus的静态配置监听端口具体数据收集的位置 默认的端口9090
    - targets: ['localhost:9090']

5.访问web页面

表达式浏览器:http://192.168.30.7:9090

查看采集的数据: http://192.168.30.7/metrics

6.部署其他被监控节点

[root@server1 ~]# systemctl stop firewalld     //关闭防火墙
[root@server1 ~]# systemctl disable firewalld  //防火墙开机不自启
[root@server1 ~]# getenforce    //关闭核心防护
Disabled
[root@server1 ~]# ntpdate ntp1.aliyun.com   //时间同步
23 Sep 13:58:19 ntpdate[19569]: adjust time server 120.25.115.20 offset 0.001137 sec

[root@server1 ~]# tar xf node_exporter-1.1.2.linux-amd64.tar.gz -C /usr/local/    //解压安装包,可在官网下载
[root@server1 ~]# cd /usr/local/node_exporter-1.1.2.linux-amd64/
[root@server1 node_exporter-1.1.2.linux-amd64]# cp node_exporter  /usr/local/bin/    //让系统可以识别
[root@server1 node_exporter-1.1.2.linux-amd64]# 

6.1 启动服务

[root@server1 node_exporter-1.1.2.linux-amd64]# ./node_exporter  //多种启动方式,直接选择直接启动
level=info ts=2021-09-23T06:19:38.800Z caller=node_exporter.go:178 msg="Starting node_exporter" version="(version=1.1.2, branch=HEAD, revision=b597c1244d7bef49e6f3359c87a56dd7707f6719)"
level=info ts=2021-09-23T06:19:38.800Z caller=node_exporter.go:179 msg="Build context" build_context="(go=go1.15.8, user=root@f07de8ca602a, date=20210305-09:29:10)"
level=warn ts=2021-09-23T06:19:38.801Z caller=node_exporter.go:181 msg="Node Exporter is running as root user. This exporter is designed to run as unpriviledged user, root is not required."
level=info ts=2021-09-23T06:19:38.801Z caller=filesystem_common.go:74 collector=filesystem msg="Parsed flag --collector.filesystem.ignored-mount-points" flag=^/(dev|proc|sys|var/lib/docker/.+)($|/)
level=info ts=2021-09-23T06:19:38.801Z caller=filesystem_common.go:76 collector=filesystem msg="Parsed flag --collector.filesystem.ignored-fs-types" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
level=info ts=2021-09-23T06:19:38.801Z caller=node_exporter.go:106 msg="Enabled collectors"
level=info ts=2021-09-23T06:19:38.801Z caller=node_exporter.go:113 

6.2 修改Prometheus配置文件

[root@prometheus prometheus-2.27.1.linux-amd64]# tail prometheus.yml 
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'nodes'        //添加静态targets才能使得server1节点加入加
    static_configs:
    - targets: ['192.168.30.8:9100']
[root@prometheus prometheus-2.27.1.linux-amd64]# 


6.3 重启Prometheus,查看服务

二、服务发现

1.静态配置发现

[root@prometheus prometheus-2.27.1.linux-amd64]# tail prometheus.yml 
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'nodes'
    static_configs:
    - targets: 
       - 192.168.30.8:9100
       - 192.168.30.9:9100     //新增192.168.30.9

2.动态发现

2.1 基于文件服务发现

2.1.1 编写prometheus.yml文件

[root@prometheus file_sd]# tail -n 15 prometheus.yml  //只列出和静态文件不同的地方
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    file_sd_configs:
    - files:
      - targets/prometheus_*.yml
      refresh_interval: 2m
  # All nodes
  - job_name: 'nodes'
    file_sd_configs:
    - files:
      - targets/nodes_*.yml
      refresh_interval: 2m
[root@prometheus file_sd]# 

2.1.2 编写prometheus.yml文件指定的targets文件

[root@prometheus file_sd]# cd targets/
[root@prometheus targets]# vim prometheus_server.yml
[root@prometheus targets]# vim ndoes_centos.yml
[root@prometheus targets]# cat prometheus_server.yml 
- targets:
  - 192.168.30.7:9090
  labels:
    app: prometheus
    job: prometheus
[root@prometheus targets]# cat ndoes_centos.yml 
- targets:
  - 192.168.30.8:9100
  - 192.168.30.9:9100
  labels:
    app: node-exporter
    job: node
[root@prometheus targets]# 

2.1.3 启动服务

[root@prometheus prometheus-2.27.1.linux-amd64]# ./prometheus --config.file=./file_sd/prometheus.yml
level=info ts=2021-09-23T07:08:11.028Z caller=main.go:388 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2021-09-23T07:08:11.028Z caller=main.go:426 msg="Starting Prometheus" version="(version=2.27.1, branch=HEAD, revision=db7f0bcec27bd8aeebad6b08ac849516efa9ae02)"
level=info ts=2021-09-23T07:08:11.028Z caller=main.go:431 build_context="(go=go1.16.4, user=root@fd804fbd4f25, date=20210518-14:17:54)"
level=info ts=2021-09-23T07:08:11.028Z caller=main.go:432 host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 prometheus (none))"

2.2 基于DNS自动发现 (仅作了解)

基于DNS的服务发现针对一组DNS域名进行定期查询,以发现待监控的目标,查询时使用的DNS服务器的客户端/etc/resolv.conf文件指定;
该发现机制依赖于A、AAAA和SRV资源记录,且仅支持该类方法,
尚不支持RFC6763中的高级DNS发现方式
PS:

  • SRV: SRV记录的作用是指明某域名下提供的服务。
  • 实例:
    _http._tcp.example.com.SRV 10 5 80. www.example.comSRV后面项目的含义:
    10 -优先级,类似MX记录
    5 -权重
    80-端口
    www.example.com -实际提供服务的主机名。同时SRV可以指定在端口上对应哪个service
    prometheus 基于DNS的服务中的SRV记录,让prometheus发现指定target上对应的端口对应的是exporter或instrumentation
    pull 形式 以http/https方式,拉取的对应被监控端的指标数据

2.3 基于consul发现

2.3.1 安装运行consul

[root@prometheus ~]# unzip consul_1.9.0_linux_amd64.zip -d /usr/local/bin/    
Archive:  consul_1.9.0_linux_amd64.zip
  inflating: /usr/local/bin/consul   
[root@prometheus ~]# mkdir -pv /consul/data/
mkdir: 已创建目录 "/consul"
mkdir: 已创建目录 "/consul/data/"
[root@prometheus ~]# mkdir /etc/consul
[root@prometheus ~]# cd /etc/consul/   
[root@prometheus consul]# consul agent -dev -ui -data-dir=/consul/data/ \\
> -config-dir=/etc/consul/ -client=0.0.0.0
==> Starting Consul agent...
           Version: '1.9.0'
           Node ID: '87f6c071-7f67-d561-bec6-00a31c863911'
         Node name: 'prometheus'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
      Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

2.3.2 编辑/etc/consul目录下的prometheus-servers.json配置文件

[root@prometheus ~]# cat /etc/consul/prometheus-servers.json 

  "services": [
    
      "id": "prometheus-server-node01",
      "name": "prom-server-node01",
      "address": "192.168.30.7",
      "port": 9090,
      "tags": ["prometheus"],
      "checks": [
        "http": "http://192.168.30.7:9090/metrics",
        "interval": "5s"
      ]
    
  ]

[root@prometheus ~]# consul reload
Configuration reload triggered

2.3.3 启动Prometheus

[root@prometheus ~]# cd /usr/local/prometheus-2.27.1.linux-amd64/
[root@prometheus prometheus-2.27.1.linux-amd64]# ./prometheus 
level=info ts=2021-09-23T12:51:55.014Z caller=main.go:388 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2021-09-23T12:51:55.014Z caller=main.go:426 msg="Starting Prometheus" version="(version=2.27.1, branch=HEAD, revision=db7f0bcec27bd8aeebad6b08ac849516efa9ae02)"
level=info ts=2021-09-23T12:51:55.014Z caller=main.go:431 build_context="(go=go1.16.4, user=root@fd804fbd4f25, date=20210518-14:17:54)"
level=info ts=2021-09-23T12:51:55.014Z caller=main.go:432 host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 prometheus (none))"

2.3.4 浏览器访问

2.3.5 创建consul自动发现的prometheus.yml文件

[root@prometheus prometheus-2.27.1.linux-amd64]# cd cousul-sd/
[root@prometheus cousul-sd]# cat ../prometheus.yml > prometheus.yml
[root@prometheus cousul-sd]# vim prometheus.yml 
[root@prometheus cousul-sd]# tail -n 17 prometheus.yml 
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    consul_sd__configs:
    - server: "192.168.30.7:8500"
      tags:
      - "prometheus"
      refresh_interval: 2m
  - job_name: 'nodes'
    consul_sd_configs:
    - server: "192.168.30.7:8500" 
      tags:
      - "nodes"
      refresh_interval: 2m

2.3.6 注册其他节点(降低实验难度,直接定义文件)

[root@prometheus consul]# cat nodes.json 

  "services": [
    
      "id": "node_exporter-node01",
      "name": "node01",
      "address": "192.168.30.8",
      "port": 9100,
      "tags": ["nodes"],
      "checks": [
        "http": "http://192.168.30.8:9100/metrics",
        "interval": "5s"
      ]
    ,
    
      "id": "node_exporter-node02",
      "name": "node02",
      "address": "192.168.30.9",
      "port": 9100,
      "tags": ["nodes"],
      "checks": [
        "http": "http://192.168.30.8:9100/metrics",
        "interval": "5s"
      ]
    
  ]

[root@prometheus consul]# consul reload
Configuration reload triggered

2.3.7 启动所有服务

2.4 基于K8S API的服务发现

prom 基于k8s api的服务发现机制,支持将API server中node、service、endpoint、pod和ingress等资源类型下相应的各资源对象视为target,并持续监视相关资源变化情况(K8S 的api server可自动发现及自动添加)
其中

  • ① node、service、endpoint、pod和ingress资源分别由各自的发现机制进行定义,以node为例,pro监控node可以直接在node节点上部署exporter,也可以直接将kubectl视为监控节点的入口之一
    kubectl 内置了cadvisor(容器监控工具)
    cadvisor :用来分析运行中的容器的资源占用及其性能特性的工具,同时提供基础查询界面和http接口

  • ② 负责发现每种类型资源对象的组件,在pro中称为一个“role"(role 为每个资源类型独有的一种自动发现机制)

  • ③ 同时支持在集群上基于daemonset控制器部署node-exporter后发现个节点

以上是关于Prometheus部署及服务发现的主要内容,如果未能解决你的问题,请参考以下文章

Prometheus部署及服务发现

监控Prometheus的概述安装与服务发现

监控Prometheus的概述安装与服务发现

监控大神Prometheus的概述安装与服务发现(持续更新中...)

prometheus部署

prometheus部署