即使容器应用程序正常运行,区域网络端点组也不健康

Posted

技术标签:

【中文标题】即使容器应用程序正常运行,区域网络端点组也不健康【英文标题】:Zonal network endpoint group unhealthy even though that container application working properly 【发布时间】:2021-11-15 12:42:43 【问题描述】:

我在 Google Cloud 上创建了一个 Kubernetes 集群,即使应用程序运行正常(我已经检查了集群内正在运行的请求),NEG 运行状况检查似乎也无法正常运行。关于原因的任何想法?

我尝试将服务从 NodePort 更改为 LoadBalancer,这是向服务添加注释的不同方式。我在想也许它可能与 django 端的 https 要求有关。

# [START kubernetes_deployment]
apiVersion: apps/v1
kind: Deployment
metadata:
  name: moner-app
  labels:
    app: moner-app
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: moner-app
  template:
    metadata:
      labels:
        app: moner-app
    spec:
      containers:
      - name: moner-core-container
        image: my-template
        imagePullPolicy: Always
        resources:
          requests:
            memory: "128Mi"
          limits:
            memory: "512Mi"
        startupProbe:
          httpGet:
            path: /ht/
            port: 5000
            httpHeaders:
              - name: "X-Forwarded-Proto"
                value: "https"
          failureThreshold: 30
          timeoutSeconds: 10
          periodSeconds: 10
          initialDelaySeconds: 90
        readinessProbe:
            initialDelaySeconds: 120
            httpGet:
              path: "/ht/"
              port: 5000
              httpHeaders:
              - name: "X-Forwarded-Proto"
                value: "https"
            periodSeconds: 10
            failureThreshold: 3
            timeoutSeconds: 10
        livenessProbe:
          initialDelaySeconds: 30
          failureThreshold: 3
          periodSeconds: 30
          timeoutSeconds: 10
          httpGet:
            path: "/ht/"
            port: 5000
            httpHeaders:
              - name: "X-Forwarded-Proto"
                value: "https"
        volumeMounts:
          - name: cloudstorage-credentials
            mountPath: /secrets/cloudstorage
            readOnly: true
        env:
            # [START_secrets]
            - name: THIS_POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: GRACEFUL_TIMEOUT
              value: '120'
            - name: GUNICORN_HARD_TIMEOUT
              value: '90'
            - name: DJANGO_ALLOWED_HOSTS
              value: '*,$(THIS_POD_IP),0.0.0.0'
        ports:
        - containerPort: 5000
        args: ["/start"]
        
      # [START proxy_container]
      - image: gcr.io/cloudsql-docker/gce-proxy:1.16
        name: cloudsql-proxy
        command: ["/cloud_sql_proxy", "--dir=/cloudsql",
                  "-instances=moner-dev:us-east1:core-db=tcp:5432",
                  "-credential_file=/secrets/cloudsql/credentials.json"]
        resources:
          requests:
            memory: "64Mi"
          limits:
            memory: "128Mi"
        volumeMounts:
          - name: cloudsql-oauth-credentials
            mountPath: /secrets/cloudsql
            readOnly: true
          - name: ssl-certs
            mountPath: /etc/ssl/certs
          - name: cloudsql
            mountPath: /cloudsql
      # [END proxy_container]
      # [START volumes]
      volumes:
        - name: cloudsql-oauth-credentials
          secret:
            secretName: cloudsql-oauth-credentials
        - name: ssl-certs
          hostPath:
            path: /etc/ssl/certs
        - name: cloudsql
          emptyDir: 
        - name: cloudstorage-credentials
          secret:
            secretName: cloudstorage-credentials
      # [END volumes]
# [END kubernetes_deployment]

---

# [START service]
apiVersion: v1
kind: Service
metadata:
  name: moner-svc
  annotations:
    cloud.google.com/neg: '"ingress": true, "exposed_ports": "5000":' # Creates an NEG after an Ingress is created
    cloud.google.com/backend-config: '"default": "moner-backendconfig"'
  labels:
    app: moner-svc
spec:
  type: NodePort
  ports:
  - name: moner-core-http
    port: 5000
    protocol: TCP
    targetPort: 5000
  selector:
    app: moner-app
# [END service]

---

# [START certificates_setup]
apiVersion: networking.gke.io/v1
kind: ManagedCertificate
metadata:
  name: managed-cert
spec:
  domains:
    - domain.com
    - app.domain.com
# [END certificates_setup]

---

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: moner-backendconfig
spec:
  customRequestHeaders:
      headers:
      - "X-Forwarded-Proto:https"
  healthCheck:
    checkIntervalSec: 15
    port: 5000
    type: HTTP
    requestPath: /ht/


---

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: managed-cert-ingress
  annotations:
    kubernetes.io/ingress.global-static-ip-name: moner-ssl
    networking.gke.io/managed-certificates: managed-cert
    kubernetes.io/ingress.class: "gce"
spec:
  defaultBackend:
    service:
      name: moner-svc
      port: 
        name: moner-core-http

【问题讨论】:

在你的 BackendConfig 中,HTTP 应该是HTTPS吗? 对于您的服务,您可以使用NodePort,但建议使用ClusterIP @gari Singh,内部服务在没有 HTTPS 的情况下运行,ssl 终止应该在入口负载均衡器上。我尝试使用 clusterip 但没有多大帮助 【参考方案1】:

我仍然不确定为什么,但是当我将服务移动到端口 80 并将运行状况检查保持在 5000 时,我已经成功地工作了。

服务配置:

# [START service]
apiVersion: v1
kind: Service
metadata:
  name: moner-svc
  annotations:
    cloud.google.com/neg: '"ingress": true, "exposed_ports": "5000":' # Creates an NEG after an Ingress is created
    cloud.google.com/backend-config: '"default": "moner-backendconfig"'
  labels:
    app: moner-svc
spec:
  type: NodePort
  ports:
  - name: moner-core-http
    port: 80
    protocol: TCP
    targetPort: 5000
  selector:
    app: moner-app
# [END service]

后端配置:

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: moner-backendconfig
spec:
  customRequestHeaders:
      headers:
      - "X-Forwarded-Proto:https"
  healthCheck:
    checkIntervalSec: 15
    port: 5000
    type: HTTP
    requestPath: /ht/

【讨论】:

【参考方案2】:

显然,您没有 GCP 防火墙规则来允许端口 5000 上的流量流向您的 GKE 节点。 Creating an ingress firewall rule 的 IP 范围 - 0.0.0.0/0 和端口 - 针对您的 GKE 节点的 TCP 5000 可以让您的设置即使使用端口 5000 也能正常工作。

【讨论】:

以上是关于即使容器应用程序正常运行,区域网络端点组也不健康的主要内容,如果未能解决你的问题,请参考以下文章

容器HEALTHCHECK指令对接ASP.NET Core健康检查能力

在 Prisma docker 容器上添加健康检查路径

即使目标组健康检查失败,NLB 是不是会转发到 ALB?

终于对探针下手了

即使主机健康且能够处理请求,ELB 也会返回 HTTP 504 错误

docker HealthCheck健康检查