十二docker swarm监控
Posted 哭泣的馒头
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了十二docker swarm监控相关的知识,希望对你有一定的参考价值。
一、cAdvisor+InfluxDB+Grafana
组件说明
cAdvisor:数据收集模块,需要部署在集群中的每一个节点上,当然前提条件是节点接受task(swarm三节点,2manage,3node,所以三台虚拟机都需要部署)。
InfluxDB:数据存储模块。
Grafana:数据展示模块
为了方便运行,优先手动将所需镜像下载一下(docker管理节点:tutum/influxdb(这个别拉取最近的influxdb,改的面目全非)、grafana/grafana,三台虚拟机:google/cadvisor)
1、配置docker compose
[root@docker dockerfile]# cat docker-stack.yml
version: 3.9
services:
influx:
image: tutum/influxdb:latest
volumes:
- influx:/var/lib/influxdb
environment:
- PRE_CREATE_DB=cadvisor
ports:
- "8083:8083"
- "8086:8086"
deploy:
replicas: 1
placement:
constraints:
- node.hostname == docker ##绑定运行在hostname为docker的管理节点
grafana:
image: grafana/grafana:latest
ports:
- 0.0.0.0:80:3000
volumes:
- grafana:/var/lib/grafana
depends_on:
- influx
deploy:
replicas: 1
placement:
constraints:
- node.hostname == docker ##绑定运行在hostname为docker的管理节点
cadvisor:
image: google/cadvisor:latest
hostname: {{.Node.Hostname}}
command: -logtostderr -docker_only -storage_driver=influxdb -storage_driver_db=cadvisor -storage_driver_host=influx:8086
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
depends_on:
- influx
deploy:
mode: global ##mode默认是replicated,如果为global,则每个work节点都只运行一个副本
volumes:
influx:
driver: local
grafana:
driver: local
docker stack deploy -c docker-stack.yml monitor
docker stack services monitor
docker service ps monitor_cadvisor
docker service ps monitor_grafana
docker service ps monitor_influx
2、配置influxdb账号Mima
最新版的influxdb镜像,没法打开8083,compose里面设置的创建数据库也不生效,不知道为啥
3、配置grafana
grafana当时配置在了node.hostname == docker
端口映射是80--》3000
登录grafana:http://192.168.10.128/login
admin/admin初次登录会让修改Mima
在grafana下载Docker Swarm Dashboard
https://grafana.com/grafana/dashboards/1367/revisions
有个github网址上面也有个dashboard,但是下载下来没法使用
https://github.com/botleg/swarm-monitoring/blob/master/dashboard.json
倒入grafana下载的dashboard
4、清理试验
docker stack remove monitor
docker volume prune
二、Prometheus(官网补录)
/etc/docker/daemon.json
{
"metrics-addr" : "127.0.0.1:9323",
"experimental" : true
}
/tmp/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: codelab-monitor
# Load rules once and periodically evaluate them according to the global evaluation_interval.
rule_files:
# - "first.rules"
# - "second.rules"
# A scrape configuration containing exactly one endpoint to scrape:
# Here its Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: prometheus
# metrics_path defaults to /metrics
# scheme defaults to http.
static_configs:
- targets: [localhost:9090]
- job_name: docker
# metrics_path defaults to /metrics
# scheme defaults to http.
static_configs:
- targets: [localhost:9323]
docker service create --replicas 1 --name my-prometheus \\
--mount type=bind,source=/tmp/prometheus.yml,destination=/etc/prometheus/prometheus.yml \\
--publish published=9090,target=9090,protocol=tcp \\
prom/prometheus
Verify that the Docker target is listed at http://localhost:9090/targets/
docker service create \\
--replicas 10 \\
--name ping_service \\
alpine ping docker.com
结束清理
docker service remove ping_service
以上是关于十二docker swarm监控的主要内容,如果未能解决你的问题,请参考以下文章
docker swarm cAdvisor+InfluxDB+Grafana 监控
docker swarm 部署 prometheus 用于监控服务器