k8s-prometheus

Posted 芒果牛奶

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了k8s-prometheus相关的知识,希望对你有一定的参考价值。

promethus

基于k8s

收集数据

node-exporter

vi node-exporter-ds.yml

apiVersion: extensions/v1beta1

kind: DaemonSet

metadata:

  name: node-exporter

  labels:

    app: node-exporter

spec:

  template:

    metadata:

      labels:

        app: node-exporter

    spec:

      hostNetwork: true

      containers:

      - image: prom/node-exporter

        name: node-exporter

        ports:

        - containerPort: 9100

        volumeMounts:

        - mountPath: "/etc/localtime"

          name: timezone

      volumes:

      - name: timezone

          hostPath:

            path: /etc/localtime

存储,持久卷,创建一个10G的pv,基于nfs

vi prometheus-pv.yaml

apiVersion: v1

kind: PersistentVolume

metadata:

  name: gwj-pv-prometheus

  labels:

    app: gwj-pv

spec:

  capacity:

    storage: 10Gi

  volumeMode: Filesystem

  accessModes:

  - ReadWriteMany

  persistentVolumeReclaimPolicy: Recycle

  storageClassName: slow

  mountOptions:

  - hard

  - nfsvers=4.1

  nfs:

    path: /storage/gwj-prometheus

    server: 10.1.99.1

持久卷申领,基于刚刚创建的pv,申领一个5G的pvc

vi prometheus-pvc.yaml

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: gwj-prometheus-pvc

  namespace: gwj

spec:

  accessModes:

  - ReadWriteMany

  volumeMode: Filesystem

  resources:

    requests:

      storage: 5Gi

  selector:

    matchLabels:

      app: gwj-pv

  storageClassName: slow

设置prometheus rbac权限

clusterrole.rbac.authorization.k8s.io/gwj-prometheus-clusterrole created

serviceaccount/gwj-prometheus created

clusterrolebinding.rbac.authorization.k8s.io/gwj-prometheus-rolebinding created

vi prometheus-rbac.yml

apiVersion: rbac.authorization.k8s.io/v1beta1

kind: ClusterRole

metadata:

  name: gwj-prometheus-clusterrole

rules:

  • apiGroups: [""]

  resources:

  - nodes

  - nodes/proxy

  - services

  - endpoints

  - pods

  verbs: ["get", "list", "watch"]

  • apiGroups:

  - extensions

  resources:

  - ingresses

  verbs: ["get", "list", "watch"]

  • nonResourceURLs: ["/metrics"]

  verbs: ["get"]


apiVersion: v1

kind: ServiceAccount

metadata:

  namespace: gwj

  name: gwj-prometheus


apiVersion: rbac.authorization.k8s.io/v1beta1

kind: ClusterRoleBinding

metadata:

  name: gwj-prometheus-rolebinding

roleRef:

  apiGroup: rbac.authorization.k8s.io

  kind: ClusterRole

  name: gwj-prometheus-clusterrole

subjects:

  • kind: ServiceAccount

  name: gwj-prometheus

  namespace: gwj

创建prometheus 配置文件,使用configmap

vi prometheus-cm.yml

apiVersion: v1

kind: ConfigMap

metadata:

  name: gwj-prometheus-cm

  namespace: gwj

data:

  prometheus.yml: |

    rule_files:

    - /etc/prometheus/rules.yml

    alerting:

      alertmanagers:

      - static_configs:

        - targets: ["gwj-alertmanger-svc:80"]

    global:

      scrape_interval: 10s

      scrape_timeout: 10s

      evaluation_interval: 10s

    scrape_configs:

    - job_name: \'kubernetes-nodes\'

      scheme: https

      tls_config:

        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

      kubernetes_sd_configs:

      - role: node

      relabel_configs:

      - action: labelmap

        regex: __meta_kubernetes_node_label_(.+)

      - source_labels: [__meta_kubernetes_node_name]

        regex: (.+)

        target_label: metrics_path

        replacement: /api/v1/nodes/${1}/proxy/metrics

      - target_label: address

        replacement: kubernetes.default.svc:443

    - job_name: \'kubernetes-node-exporter\'

      tls_config:

        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

      kubernetes_sd_configs:

      - role: node

      relabel_configs:

      - action: labelmap

        regex: __meta_kubernetes_node_label_(.+)

      - source_labels: [__meta_kubernetes_role]

        action: replace

        target_label: kubernetes_role

      - source_labels: [__address__]

        regex: \'(.*):10250\'

        replacement: \'${1}:9100\'

        target_label: address

    - job_name: \'kubernetes-pods\'

      kubernetes_sd_configs:

      - role: pod

      relabel_configs:

      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]

        action: keep

        regex: true

      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]

        action: replace

        target_label: address

        regex: (1+)(?::d+)?;(d+)

        replacement: $1:$2

      - action: labelmap

        regex: __meta_kubernetes_pod_label_(.+)

      - source_labels: [__meta_kubernetes_namespace]

        action: replace

        target_label: kubernetes_namespace

      - source_labels: [__meta_kubernetes_pod_name]

        action: replace

        target_label: kubernetes_pod_name

    - job_name: \'kubernetes-cadvisor\'

      scheme: https

      tls_config:

        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

      kubernetes_sd_configs:

      - role: node

      relabel_configs:

      - action: labelmap

        regex: __meta_kubernetes_node_label_(.+)

      - target_label: address

        replacement: kubernetes.default.svc:443

      - source_labels: [__meta_kubernetes_node_name]

        regex: (.+)

        target_label: metrics_path

        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

    - job_name: \'kubernetes-service-endpoints\'

      kubernetes_sd_configs:

      - role: endpoints

      relabel_configs:

      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

        action: keep

        regex: true

      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]

        action: replace

        target_label: scheme

        regex: (https?)

      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]

        action: replace

        target_label: metrics_path

        regex: (.+)

      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]

        action: replace

        target_label: address

        regex: (1+)(?::d+)?;(d+)

        replacement: $1:$2

      - action: labelmap

        regex: __meta_kubernetes_service_label_(.+)

      - source_labels: [__meta_kubernetes_namespace]

        action: replace

        target_label: kubernetes_namespace

      - source_labels: [__meta_kubernetes_service_name]

        action: replace

        target_label: kubernetes_name

  rules.yml: |

    groups:

    - name: kebernetes_rules

      rules:

      - alert: InstanceDown

        expr: up{job="kubernetes-node-exporter"} == 0

        for: 5m

        labels:

          severity: page

        annotations:

          summary: "Instance {{ $labels.instance }} down"

          description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

      - alert: APIHighRequestLatency

        expr: api_http_request_latencies_second{quantile="0.5"} > 1

        for: 10m

        annotations:

          summary: "High request latency on {{ $labels.instance }}"

          description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"

      - alert: StatefulSetReplicasMismatch

        annotations:

          summary: "Replicas miss match"

          description: StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has not matched the expected number of replicas for longer than 3 minutes.

        expr: label_join(kube_statefulset_status_replicas_ready != kube_statefulset_replicas, "instance", "/", "namespace", "statefulset")

        for: 3m

        labels:

          severity: critical

      - alert: PodFrequentlyRestarting

        expr: increase(kube_pod_container_status_restarts_total[1h]) > 5

        for: 5m

        labels:

          severity: warning

        annotations:

          description: Pod {{ $labels.namespaces }}/{{ $labels.pod }} is was restarted {{ $value }} times within the last hour

          summary: Pod is restarting frequently

      - alert: DeploymentReplicasNotUpdated

        expr: ((kube_deployment_status_replicas_updated != kube_deployment_spec_replicas)

          or (kube_deployment_status_replicas_available != kube_deployment_spec_replicas))

          unless (kube_deployment_spec_paused == 1)

        for: 5m

        labels:

          severity: critical

        annotations:

          description: Replicas are not updated and available for deployment {{ $labels.namespace }}/{{ $labels.deployment }}

          summary: Deployment replicas are outdated

      - alert: DaemonSetRolloutStuck

        expr: kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled * 100 < 100

        for: 5m

        labels:

          severity: critical

        annotations:

          description: Only {{ $value }}% of desired pods scheduled and ready for daemonset {{ $labels.namespace }}/{{ $labels.daemonset }}

          summary: DaemonSet is missing pods

      - alert: DaemonSetsNotScheduled

        expr: kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled > 0

        for: 10m

        labels:

          severity: warning

        annotations:

          description: \'{{{{ $value }}}} Pods of DaemonSet {{{{ $labels.namespace }}}}/{{{{ $labels.daemonset }}}} are not scheduled.\'

          summary: Daemonsets are not scheduled correctly

      - alert: DaemonSetsMissScheduled

        expr: kube_daemonset_status_number_misscheduled > 0

        for: 10m

        labels:

          severity: warning

        annotations:

          description: \'{{{{ $value }}}} Pods of DaemonSet {{{{ $labels.namespace }}}}/{{{{ $labels.daemonset }}}} are running where they are not supposed to run.\'

          summary: Daemonsets are not scheduled correctly

      - alert: Node_Boot_Time

        expr: (node_time_seconds - node_boot_time_seconds) <= 150

        for: 15s

        annotations:

          summary: "机器{{ $labels.instacnce }} 刚刚重启,时间少于 150s"

      - alert: Available_Percent

        expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes <= 0.2

        for: 15s

        annotations:

          summary: "机器{{ $labels.instacnce }} available less than 20%"

      - alert: FD_Used_Percent

        expr: (node_filefd_allocated / node_filefd_maximum) >= 0.8

        for: 15s

        annotations:

          summary: "机器{{ $labels.instacnce }} FD used more than 80%"

根据刚刚创建的cm的要求,创建alertmanger 用于告警

vi alertmanger.yml


kind: Service

apiVersion: v1

metadata:

  name: gwj-alertmanger-svc

  namespace: gwj

spec:

  selector:

    app: gwj-alert-pod

  ports:

    - protocol: TCP

      port: 80

      targetPort: 9093


apiVersion: apps/v1

kind: StatefulSet

metadata:

  name: gwj-alert-sts

  namespace: gwj

  labels:

    app: gwj-alert-sts

spec:

  replicas: 1

  serviceName: gwj-alertmanger-svc

  selector:

    matchLabels:

      app: gwj-alert-pod

  template:

    metadata:

      labels:

        app: gwj-alert-pod

    spec:

      containers:

      - image: prom/alertmanager:v0.14.0

        name: gwj-alert-pod

        ports:

        - containerPort: 9093

          protocol: TCP

        volumeMounts:

        - mountPath: "/etc/localtime"

          name: timezone

      volumes:

      - name: timezone

        hostPath:

          path: /etc/localtime

kubectl apply -f alertmanger.yml

  service/gwj-alertmanger-svc created

  statefulset.apps/gwj-alert-sts created

创建prometheus statefulset来创建prometheus

service/gwj-prometheus-svc created

statefulset.apps/gwj-prometheus-sts created

/prometheus

pvc: gwj-prometheus-pvc

/etc/prometheus/

configMap:

  name: gwj-prometheus-cm

vi prometheus-sts.yml


kind: Service

apiVersion: v1

metadata:

  name: gwj-prometheus-svc

  namespace: gwj

  labels:

    app: gwj-prometheus-svc

spec:

  ports:

  - port: 80

    targetPort: 9090

  selector:

    app: gwj-prometheus-pod


apiVersion: apps/v1

kind: StatefulSet

metadata:

  name: gwj-prometheus-sts

  namespace: gwj

  labels:

    app: gwj-prometheus-sts

spec:

  replicas: 1

  serviceName: gwj-prometheus-svc

  selector:

    matchLabels:

      app: gwj-prometheus-pod

  template:

    metadata:

      labels:

        app: gwj-prometheus-pod

    spec:

      containers:

      - image: prom/prometheus:v2.9.2

        name: gwj-prometheus-pod

        ports:

        - containerPort: 9090

          protocol: TCP

        volumeMounts:

        - mountPath: "/prometheus"

          name: data

        - mountPath: "/etc/prometheus/"

          name: config-volume

        - mountPath: "/etc/localtime"

          name: timezone

        resources:

          requests:

            cpu: 100m

            memory: 100Mi

          limits:

            cpu: 500m

            memory: 2000Mi

      serviceAccountName: gwj-prometheus

      volumes:

      - name: data

        persistentVolumeClaim:

          claimName: gwj-prometheus-pvc

      - name: config-volume

        configMap:

          name: gwj-prometheus-cm

      - name: gwj-prometheus-rule-cm

        configMap:

          name: gwj-prometheus-rule-cm

      - name: timezone

        hostPath:

          path: /etc/localtime

kubectl apply -f prometheus-sts.yml

  service/gwj-prometheus-svc created

  statefulset.apps/gwj-prometheus-sts created

创建ingress,根据域名分发到不同的service

vi prometheus-ingress.yml


apiVersion: extensions/v1beta1

kind: Ingress

metadata:

  namespace: gwj

  annotations:

  name: gwj-ingress-prometheus

spec:

  rules:

  - host: gwj.syncbug.com

    http:

      paths:

        - path: /

          backend:

            serviceName: gwj-prometheus-svc

            servicePort: 80

  - host: gwj-alert.syncbug.com

    http:

      paths:

        - path: /

          backend:

            serviceName: gwj-alertmanger-svc

            servicePort: 80

kubectl apply -f prometheus-ingress.yml

  ingress.extensions/gwj-ingress-prometheus created

访问对应的域名

gwj.syncbug.com

查看目标对象是否正确

http://gwj.syncbug.com/targets

查看配置文件是否正确

http://gwj.syncbug.com/config

gwj-alert.syncbug.com

===grafana

vi grafana-pv.yaml

apiVersion: v1

kind: PersistentVolume

metadata:

  name: gwj-pv-grafana

  labels:

    app: gwj-pv-gra

spec:

  capacity:

    storage: 2Gi

  volumeMode: Filesystem

  accessModes:

  - ReadWriteMany

  persistentVolumeReclaimPolicy: Recycle

  storageClassName: slow

  mountOptions:

  - hard

  - nfsvers=4.1

  nfs:

    path: /storage/gwj-grafana

    server: 10.1.99.1

vi grafana-pvc.yaml

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: gwj-grafana-pvc

  namespace: gwj

spec:

  accessModes:

  - ReadWriteMany

  volumeMode: Filesystem

  resources:

    requests:

      storage: 1Gi

  selector:

    matchLabels:

      app: gwj-pv-gra

  storageClassName: slow

vi grafana-deployment.yaml

apiVersion: extensions/v1beta1

kind: Deployment

metadata:

  labels:

    name: grafana

  name: grafana

  namespace: gwj

spec:

  replicas: 1

  revisionHistoryLimit: 10

  selector:

    matchLabels:

      app: grafana

  template:

    metadata:

      labels:

        app: grafana

      name: grafana

    spec:

      containers:

      - env:

        - name: GF_PATHS_DATA

          value: /var/lib/grafana/

        - name: GF_PATHS_PLUGINS

          value: /var/lib/grafana/plugins

        image: grafana/grafana:6.2.4

        imagePullPolicy: IfNotPresent

        name: grafana

        ports:

        - containerPort: 3000

          name: grafana

          protocol: TCP

        volumeMounts:

        - mountPath: /var/lib/grafana/

          name: data

        - mountPath: /etc/localtime

          name: localtime

      dnsPolicy: ClusterFirst

      restartPolicy: Always

      volumes:

      - name: data

        persistentVolumeClaim:

          claimName: gwj-grafana-pvc

      - name: localtime

        hostPath:

          path: /etc/localtime

vi grafana-ingress.yaml


apiVersion: extensions/v1beta1

kind: Ingress

metadata:

  namespace: gwj

  annotations:

  name: gwj-ingress-grafana

spec:

  rules:

  - host: gwj-grafana.syncbug.com

    http:

      paths:

        - path: /

          backend:

            serviceName: gwj-grafana-svc

            servicePort: 80


kind: Service

apiVersion: v1

metadata:

  name: gwj-grafana-svc

  namespace: gwj

spec:

  selector:

    app: grafana

  ports:

    - protocol: TCP

      port: 80

      targetPort: 3000

进入grafana,gwj-grafana.syncbug.com

默认: admin admin

输入datasource: http://gwj-prometheus-svc:80

import模版


  1. : ↩

以上是关于k8s-prometheus的主要内容,如果未能解决你的问题,请参考以下文章

k8s-prometheus disk

微信小程序代码片段

VSCode自定义代码片段——CSS选择器

谷歌浏览器调试jsp 引入代码片段,如何调试代码片段中的js

片段和活动之间的核心区别是啥?哪些代码可以写成片段?

VSCode自定义代码片段——.vue文件的模板