K8S实践Ⅸ(集群监控)
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了K8S实践Ⅸ(集群监控)相关的知识,希望对你有一定的参考价值。
一、PrometheusOperator介绍PrometheusOperator是CoreOS开源的一套用于管理在Kubernetes集群上的Prometheus的控制器,简化在Kubernetes上部署、管理和运行Prometheus和Alertmanager集群的操作。
二、部署
1.从官方下载部署文件
# git clone https://github.com/coreos/kube-prometheus.git
2.更改镜像仓库地址
# mkdir prometheus
# cp kube-prometheus/manifests/* prometheus/
# sed -i ‘s#k8s.gcr.io#gcr.azk8s.cn/google_containers#g‘ prometheus/*
# sed -i ‘s#quay.io#quay.azk8s.cn#g‘ prometheus/*
# cat prometheus/* | grep image
3.部署所有资源
# kubectl apply -f prometheus/
4.查看创建的ns和crd
# kubectl get ns |grep monitoring
monitoring Active 3m30s
# kubectl get crd
NAME CREATED AT
alertmanagers.monitoring.coreos.com 2019-09-10T09:13:00Z
podmonitors.monitoring.coreos.com 2019-09-10T09:13:00Z
prometheuses.monitoring.coreos.com 2019-09-10T09:13:01Z
prometheusrules.monitoring.coreos.com 2019-09-10T09:13:02Z
servicemonitors.monitoring.coreos.com 2019-09-10T09:13:03Z
5.查看monitoring下所有的pod和svc
# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 23h
alertmanager-main-1 2/2 Running 0 23h
alertmanager-main-2 2/2 Running 0 23h
grafana-57bfdd47f8-bhkvv 1/1 Running 0 23h
kube-state-metrics-8cf4797dc-7dg4w 4/4 Running 0 23h
node-exporter-446xd 2/2 Running 0 23h
node-exporter-8sbsf 2/2 Running 0 23h
node-exporter-dk7qk 2/2 Running 0 23h
node-exporter-vdsqg 2/2 Running 0 23h
node-exporter-w7czt 2/2 Running 0 23h
node-exporter-wx7vj 2/2 Running 0 23h
prometheus-adapter-6b9989ccbd-bcl2h 1/1 Running 0 23h
prometheus-k8s-0 3/3 Running 1 23h
prometheus-k8s-1 3/3 Running 1 23h
prometheus-operator-7894d75578-rg2gl 1/1 Running 0 23h
# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main NodePort 10.97.155.71 <none> 9093:30093/TCP 23h
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 23h
grafana NodePort 10.110.28.251 <none> 3000:30030/TCP 23h
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 23h
node-exporter ClusterIP None <none> 9100/TCP 23h
prometheus-adapter ClusterIP 10.111.75.114 <none> 443/TCP 23h
prometheus-k8s NodePort 10.109.3.70 <none> 9090:30090/TCP 23h
prometheus-operated ClusterIP None <none> 9090/TCP 23h
prometheus-operator ClusterIP None <none> 8080/TCP 23h
6.更改端口模式为NodePort映射端口
# kubectl edit svc prometheus-k8s -n monitoring
service/prometheus-k8s edited
# kubectl edit svc grafana -n monitoring
service/grafana edited
# kubectl edit svc alertmanager-main -n monitoring
service/alertmanager-main edited
# kubectl get svc -n monitoring | grep NodePort
alertmanager-main NodePort 10.97.155.71 <none> 9093:30093/TCP 21h
grafana NodePort 10.110.28.251 <none> 3000:30030/TCP 21h
prometheus-k8s NodePort 10.109.3.70 <none> 9090:30090/TCP 21h
7.访问测试
三、配置
1.查看prometheus的targets页面
发现kube-controller-manager 和 kube-scheduler 这两个系统组件没有监控到,此处和ServiceMonitor 的定义有关系
# cat prometheus/prometheus-serviceMonitorKubeScheduler.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: kube-scheduler
name: kube-scheduler
namespace: monitoring
spec:
endpoints:
- interval: 30s
port: http-metrics
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kube-scheduler
selector.matchLabels在kube-system这个命名空间下面匹配具有k8s-app=kube-scheduler这样的Service,但是系统中没有对应的Service。
2.创建kube-controller-manager 和 kube-scheduler对应的Service
# cat cms-svc.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
k8s-app: kube-controller-manager
spec:
selector:
component: kube-controller-manager
ports:
- name: http-metrics
port: 10252
targetPort: 10252
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-scheduler
labels:
k8s-app: kube-scheduler
spec:
selector:
component: kube-scheduler
ports:
- name: http-metrics
port: 10251
targetPort: 10251
protocol: TCP
# kubectl describe pod kube-controller-manager-k8s-master01 -n kube-system
Labels: component=kube-controller-manager
tier=control-plane
3.查看kube-controller-manager 和 kube-scheduler是否正常
4.访问Grafana
以上是关于K8S实践Ⅸ(集群监控)的主要内容,如果未能解决你的问题,请参考以下文章
Kubernetes+Promethues+Cloud Alert实践分享