CertManager Letsencrypt CertificateRequest “未能执行自检 GET 请求”
Posted
技术标签:
【中文标题】CertManager Letsencrypt CertificateRequest “未能执行自检 GET 请求”【英文标题】:CertManager Letsencrypt CertificateRequest "failed to perform self check GET request" 【发布时间】:2020-04-10 22:12:54 【问题描述】:Waiting for http-01 challenge propagation: failed to perform self check GET request
,和这个错误https://github.com/jetstack/cert-manager/issues/656类似
但是来自 GitHub ticket cmets 的所有解决方案都没有帮助。
我正在尝试按照本教程所述在 DigitalOcean 上设置 CertManager
:https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nginx-ingress-with-cert-manager-on-digitalocean-kubernetes
我没有收到任何错误,但来自CertManager
的请求在等待状态中等待超过 40 小时。
我已经用 Nginx 成功配置了 Ingress,然后我创建了一个命名空间并创建了CertManager
CRDs:
$ kubectl create namespace cert-manager
$ kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.12.0/cert-manager.yaml
我可以按预期看到所有CertManager
pod:
$ kubectl get pods --namespace cert-manager
NAME READY STATUS RESTARTS AGE
cert-manager-5c47f46f57-gxhwv 1/1 Running 0 42h
cert-manager-cainjector-6659d6844d-xp75s 1/1 Running 0 42h
cert-manager-webhook-547567b88f-k4dv2 1/1 Running 0 42h
然后我创建了临时发行者:
---
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
namespace: cert-manager
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: some@email.here
privateKeySecretRef:
name: letsencrypt-staging
solvers:
- http01:
ingress:
class: nginx
并更新了 Ingress 配置:
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: echo-ingress
annotations:
kubernetes.io/ingress.class: "nginx"
# cert-manager.io/cluster-issuer: "letsencrypt-prod"
cert-manager.io/cluster-issuer: "letsencrypt-staging"
spec:
tls:
- hosts:
- echo.some.domain
secretName: ingress-tls
rules:
- host: echo.some.domain
http:
paths:
- backend:
serviceName: echo1
servicePort: 80
但在那之后,CertManager
没有更新证书并处于InProgress
状态:
$ date
Wed 18 Dec 2019 01:58:08 PM MSK
$ kubectl describe cert
...
Status:
Conditions:
Last Transition Time: 2019-12-16T17:23:56Z
Message: Waiting for CertificateRequest "ingress-tls-1089568541" to complete
Reason: InProgress
Status: False
Type: Ready
Events: <none>
不是使用Fake LE Intermediate X1
作为CN
,而是返回CN=Kubernetes Ingress Controller Fake Certificate,O=Acme Co
$ kubectl describe CertificateRequest
Status:
Conditions:
Last Transition Time: 2019-12-16T17:50:05Z
Message: Waiting on certificate issuance from order default/ingress-tls-1089568541-1576201144: "pending"
Reason: Pending
Status: False
Type: Ready
Events: <none>
CertManager
可能有什么问题以及如何解决?
更新:
入口日志包含以下错误:
$ kubectl -n ingress-nginx logs nginx-ingress-controller-7754db565c-g557h
I1218 17:24:30.331127 6 status.go:295] updating Ingress default/cm-acme-http-solver-4dkdn status from [] to [xxx.xxx.xxx.xxx ]
I1218 17:24:30.333250 6 status.go:295] updating Ingress default/cm-acme-http-solver-9dpqc status from [] to [xxx.xxx.xxx.xxx ]
I1218 17:24:30.341292 6 event.go:209] Event(v1.ObjectReferenceKind:"Ingress", Namespace:"default", Name:"cm-acme-http-solver-4dkdn", UID:"2e523b74-8bbb-41c7-be8a-44d8db8abd6e", APIVersion:"extensions/v1beta1", ResourceVersion:"722472", FieldPath:""): type: 'Normal' reason: 'UPDATE' Ingress default/cm-acme-http-solver-4dkdn
I1218 17:24:30.344340 6 event.go:209] Event(v1.ObjectReferenceKind:"Ingress", Namespace:"default", Name:"cm-acme-http-solver-9dpqc", UID:"b574a3b6-6c5b-4266-a4e2-6ff2de2d78e0", APIVersion:"extensions/v1beta1", ResourceVersion:"722473", FieldPath:""): type: 'Normal' reason: 'UPDATE' Ingress default/cm-acme-http-solver-9dpqc
W1218 17:24:30.442276 6 controller.go:1042] Error getting SSL certificate "default/ingress-tls": local SSL certificate default/ingress-tls was not found. Using default certificate
W1218 17:24:30.442950 6 controller.go:1042] Error getting SSL certificate "default/ingress-tls": local SSL certificate default/ingress-tls was not found. Using default certificate
W1218 17:24:33.775476 6 controller.go:1042] Error getting SSL certificate "default/ingress-tls": local SSL certificate default/ingress-tls was not found. Using default certificate
W1218 17:24:33.775956 6 controller.go:1042] Error getting SSL certificate "default/ingress-tls": local SSL certificate default/ingress-tls was not found. Using default certificate
更新2:
ingress-tls
的秘密按预期可用:
$ kubectl get secret ingress-tls -o yaml
apiVersion: v1
data:
ca.crt: ""
tls.crt: ""
tls.key: <secret-key-data-base64-encoded>
kind: Secret
metadata:
annotations:
cert-manager.io/certificate-name: ingress-tls
cert-manager.io/issuer-kind: ClusterIssuer
cert-manager.io/issuer-name: letsencrypt-staging
creationTimestamp: "2019-12-16T17:23:56Z"
name: ingress-tls
namespace: default
resourceVersion: "328801"
selfLink: /api/v1/namespaces/default/secrets/ingress-tls
uid: 5d640b66-1572-44a1-94e4-6d85a73bf21c
type: kubernetes.io/tls
更新3:
我发现 cert-manager
pod 失败并显示日志:
E1219 11:06:08.294011 1 sync.go:184] cert-manager/controller/challenges "msg"="propagation check failed" "error"="failed to perform self check GET request 'http://<some.domain>/.well-known/acme-challenge/<some-path>': Get http://<some.domain>/.well-known/acme-challenge/<some-path>: dial tcp xxx.xxx.xxx.xxx:80: connect: connection timed out" "dnsName"="<some.domain>" "resource_kind"="Challenge" "resource_name"="ingress-tls-1089568541-1576201144-1086699008" "resource_namespace"="default" "type"="http-01"
挑战状态:
$ kubectl describe challenge ingress-tls-1089568541-1576201144-471532423
Name: ingress-tls-1089568541-1576201144-471532423
Namespace: default
Labels: <none>
Annotations: <none>
API Version: acme.cert-manager.io/v1alpha2
Kind: Challenge
Metadata:
Creation Timestamp: 2019-12-19T11:32:19Z
Finalizers:
finalizer.acme.cert-manager.io
Generation: 1
Owner References:
API Version: acme.cert-manager.io/v1alpha2
Block Owner Deletion: true
Controller: true
Kind: Order
Name: ingress-tls-1089568541-1576201144
UID: 7d19d86f-0b56-4756-aa20-bb85caf80b9e
Resource Version: 872062
Self Link: /apis/acme.cert-manager.io/v1alpha2/namespaces/default/challenges/ingress-tls-1089568541-1576201144-471532423
UID: 503a8b4e-dc60-4080-91d9-2847815af1cc
Spec:
Authz URL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/123456
Dns Name: <domain>
Issuer Ref:
Group: cert-manager.io
Kind: ClusterIssuer
Name: letsencrypt-staging
Key: <key>
Solver:
http01:
Ingress:
Class: nginx
Token: <token>
Type: http-01
URL: https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/12345/abc
Wildcard: false
Status:
Presented: true
Processing: true
Reason: Waiting for http-01 challenge propagation: failed to perform self check GET request 'http://<domain>/.well-known/acme-challenge/<token>': Get http://<domain>/.well-known/acme-challenge/<token>: dial tcp xxx.xxx.xxx.xxx:80: connect: connection timed out
State: pending
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Started 4m28s cert-manager Challenge scheduled for processing
Normal Presented 4m28s cert-manager Presented challenge using http-01 challenge mechanism
我试图删除挑战以重新触发它,但它在一两分钟后失败并出现同样的错误。我检查了我是否可以从集群节点访问挑战 URL(在新 pod 内部使用 kubectl run -it ...
和 wget http://<domain>/.well-known/acme-challenge/<token>
。
【问题讨论】:
一般来说,“Kubernetes ingress Controller Fake certificate”表示证书本身或您的设置存在问题。 @mWatney 谢谢。我在 ingress-controller 中找到了这些日志(请参阅更新)。在哪里可以找到更详细的日志来查找此错误的主要原因? 再次检查这个秘密(默认命名空间下的ingress-tls)是否存在以及它是否正确(在此处发布$ kubectl get secret ingress-tls -o yaml
)。检查您的证书,有时它无法加载它,因为您有一些错误,您可以在这里查看this 其他情况。
@mWatney 看起来与bug 中的问题相同。我不知道为什么它无法执行自检。我在卸载cert-manager
fully后修复它并再次安装它。
据我所知,您的问答在DevOps SE 上更具主题性。 SO 和 SF 分别是关于软件和系统工程的。
【参考方案1】:
我没有找到这个问题的原因,所以我将发布我如何解决它作为答案。
看起来这与bug 中的问题相同。我通过卸载 cert-manager
fully 并在不更改任何配置或设置的情况下再次安装来修复它。
【讨论】:
【参考方案2】:这可能值得一看。我遇到了与Connection Timeout
类似的问题
在ingress-nginx
服务中更改LoadBalancer
。
添加/更改externalTrafficPolicy: Cluster
。
原因是,带有证书颁发者的 pod 与负载均衡器在不同的节点上结束,因此它无法通过入口与自己对话。
下面是来自https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.26.1/deploy/static/provider/cloud-generic.yaml的完整块
kind: Service
apiVersion: v1
metadata:
name: ingress-nginx
namespace: ingress-nginx
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
spec:
#CHANGE/ADD THIS
externalTrafficPolicy: Cluster
type: LoadBalancer
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
ports:
- name: http
port: 80
targetPort: http
- name: https
port: 443
targetPort: https
---
【讨论】:
@Kirill 接受作为答案,如果您认为这是解决问题的正确方法。【参考方案3】:就我而言,cert-manager
想通过内部 IP 地址请求挑战。
未能执行自检GET请求'http:///.well-known/acme-challenge/':获取http:///.well-known/acme-challenge/:拨打tcp 10.67.0.8:80 : connect: 连接超时
即DNS解析被破坏了。我通过更改 cert-manager
的部署来解决这个问题,只接受像这样的外部 DNS 服务器
spec:
template:
spec:
dnsConfig:
nameservers:
- 8.8.8.8
dnsPolicy: None
This 是你的做法。还创建了一个Issue,因此我们可以通过 helm 安装来更改它
【讨论】:
【参考方案4】:我遇到了完全相同的问题,它似乎与 Digital Ocean 负载均衡器如何工作的错误有关。该线程lets-encrypt-certificate-issuance 建议将注释service.beta.kubernetes.io/do-loadbalancer-hostname: "kube.mydomain.com"
添加到负载均衡器。在我的情况下,我没有负载均衡器的 yaml 配置文件,我只是从 nginx-ingress install script 复制负载均衡器声明并将新配置应用于 kubernetes 集群。下面是负载均衡器的最终配置。
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/do-loadbalancer-enable-proxy-protocol: 'true'
# See https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/examples/README.md#accessing-pods-over-a-managed-load-balancer-from-inside-the-cluster
service.beta.kubernetes.io/do-loadbalancer-hostname: "kube.mydomain.com"
labels:
helm.sh/chart: ingress-nginx-3.19.0
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.43.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
type: LoadBalancer
externalTrafficPolicy: Local
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
【讨论】:
这解决了我的问题,我只是添加了service.beta.kubernetes.io/do-loadbalancer-hostname
注释。我用kubectl edit service ingress-nginx-controller -n ingress-nginx
编辑了资源。请记住,您可能有不同的名称和命名空间。【参考方案5】:
我的一个 CertManager pod 被冻结,所以我将它们全部删除,然后它们重新启动。证书立即更新。
kubectl get pods -n cert-manager
(或您的 pod 所在的任何命名空间)
然后全部删除。
kubectl delete pod -n cert-manager cert-manager-xxxx cert-manager-cainjector-xxxx cert-manager-webhook-xxxx
【讨论】:
以上是关于CertManager Letsencrypt CertificateRequest “未能执行自检 GET 请求”的主要内容,如果未能解决你的问题,请参考以下文章
LetsEncrypt : LetsEncrypt 的中间证书
sh 我的Automated LetsEncrypt更新程序的第一次修订为ZNC和朋友使用letsencrypt工具
sh 用于通过letsencrypt的docker镜像更新docker nginx代理中的letsencrypt证书的模板