气流 2.1.3 使用 pgbouncer 解决 postgresql 问题

Posted

技术标签:

【中文标题】气流 2.1.3 使用 pgbouncer 解决 postgresql 问题【英文标题】:airflow 2.1.3 using pgbouncer for postgresql issue 【发布时间】:2021-12-12 20:20:27 【问题描述】:

背景信息:最近我们将气流从 2.10.14 升级到 2.1.3,pgbouncer 使用由 azure microsoft 映像 (mcr.microsoft.com/azure-oss- db-tools/pgbouncer-sidecar:latest)。

自定义的 pgbouncer 停止工作,它现在连接到主 postgresql 服务器。

所以我现在尝试使用气流 2.1.3(舵图 8.5.2)部署的 pgbouncer(https://artifacthub.io/packages/helm/airflow-helm/airflow/8.5.0#how-to-use-an-external-database),但有问题。

以下是关键信息

在我的 values.yaml 文件中,关键信息如下所示

pgbouncer:
  enabled: true
  # listen_port does not seem to take effect into pgbouncer.ini file
#  listen_port: 5432

externalDatabase:
  type: postgres
  host: psql-hostname.postgres.database.azure.com
  port: 5432
  database: airflow
  user: username@psql-hostname
  passwordSecret: "airflow-postgres-redis-name"
  passwordSecretKey: "postgresql-password-key-name"
  properties: ""
  # properties: "?sslmode=disable"
externalRedis:
  host: redis-hostname.redis.cache.windows.net
  port: 6379
  databaseNumber: 1
  passwordSecret: "airflow-postgres-redis-name"
  passwordSecretKey: "redis-password-key-name"
  properties: ""

在我的脚本中,在 kubernetes 集群下面创建

kubectl create secret generic "airflow-postgres-redis-name" \
   -n $_namespace_airflow \
    --from-literal=postgresql-password="$my-airflow2-postgre" \
    --from-literal=redis-password="$my-airflow2-redis"

当我使用 helm upgrade 应用 values.yaml 时,我注意到 pgbouncer.ini 具有以下信息。 请注意,listen_port 是 6543

$ kubectl exec -n airflow -ti airflow-pgbouncer-6f88889bf5-xtdvp -- /bin/sh
~ $ ls /home/pgbouncer/

certs          config         pgbouncer.ini  users.txt
 
~ $ cat /home/pgbouncer/pgbouncer.ini

[databases]
* = host=127.0.0.1 port=5432
[pgbouncer]
pool_mode = session
listen_port = 6543
listen_addr = *
 
~ $ cat /home/pgbouncer/users.txt
 
"username@psql-hostname" "HIDE FOR THIS NOTE"

我怀疑原因是 6543 端口不起作用,但我找不到覆盖它的方法。请帮忙。

或者如果我的怀疑是错误的,下面的日志/事件也可能让你帮助我尝试尝试

kubectl 的输出描述 pod

Events:
  Type     Reason     Age                 From               Message
 
  ----     ------     ----                ----               -------
 
  Normal   Scheduled  15m                 default-scheduler  Successfully assigned airflow/airflow-pgbouncer-6f59cf4769-bx5hf to aks-nodepool1-16099970-vmss00000a
 
  Normal   Pulling    28m                 kubelet            Pulling image "ghcr.io/airflow-helm/pgbouncer:1.15.0-patch.0"
 
  Normal   Pulled     28m                 kubelet            Successfully pulled image "ghcr.io/airflow-helm/pgbouncer:1.15.0-patch.0" in 3.7505019s
 
  Normal   Created    23m (x4 over 28m)   kubelet            Created container pgbouncer
 
  Normal   Started    23m (x4 over 28m)   kubelet            Started container pgbouncer
 
  Normal   Killing    23m (x3 over 26m)   kubelet            Container pgbouncer failed liveness probe, will be restarted
 
  Normal   Pulled     23m (x3 over 26m)   kubelet            Container image "ghcr.io/airflow-helm/pgbouncer:1.15.0-patch.0" already present on machine
 
  Warning  Unhealthy  18m (x16 over 27m)  kubelet            Liveness probe failed: psql: error: ERROR:  pgbouncer cannot connect to server
 
ERROR:  pgbouncer cannot connect to server
 
  Warning  BackOff  13m (x14 over 15m)  kubelet  Back-off restarting failed container

pod的kubectl日志输出

$ go.kube.logs airflow-pgbouncer-6f59cf4769-bx5hf
Successfully generated auth_file: /home/pgbouncer/users.txt
 
2021-10-27 09:09:43.157 UTC [6] LOG kernel file descriptor limit: 1048576 (hard: 1048576); max_client_conn: 100, max expected fd use: 112
2021-10-27 09:09:43.157 UTC [6] LOG listening on 0.0.0.0:6432
2021-10-27 09:09:43.157 UTC [6] LOG listening on [::]:6432
2021-10-27 09:09:43.157 UTC [6] LOG listening on unix:/tmp/.s.PGSQL.6432
2021-10-27 09:09:43.157 UTC [6] LOG process up: PgBouncer 1.15.0, libevent 2.1.12-stable (epoll), adns: c-ares 1.17.1, tls: OpenSSL 1.1.1k  25 Mar 2021
2021-10-27 09:10:00.602 UTC [6] LOG C-0x7f16390c91b0: (nodb)/(nouser)@10.244.0.1:41595 registered new auto-database: db=airflow
2021-10-27 09:10:00.610 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:10:15.834 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:10:31.164 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:10:43.156 UTC [6] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2021-10-27 09:10:46.165 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:11:00.824 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:41595 pooler error: client_login_timeout (server down)
2021-10-27 09:11:00.824 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:17395 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:00.965 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:6755 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:00.966 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:24068 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:01.116 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:1107 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:01.117 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:43273 pooler error: pgbouncer cannot connect to server
 2021-10-27 09:11:30.617 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:11:30.620 UTC [6] LOG got SIGINT, shutting down
2021-10-27 09:11:30.823 UTC [6] LOG server connections dropped, exiting

注意:我用“username@psql-hostname”替换了真实用户名

【问题讨论】:

您找到解决方案了吗?我面临同样的问题 是的。我做到了。让我写下总结。 【参考方案1】:

我们有 2 个选项来解决这个问题(请注意,我们的气流图表是社区图表版本 8.5.2),我们选择了第一个选项。回顾过去,选项 2 会更容易,并且几乎不需要更改,只要下一个版本正确修复它。

    鉴于community airflow chart version 8.5.2 built-in pgbouncer defaults the auth type to a fixed value, which if the pgbouncer connects to azure postgresql single server, it will fail,可以选择not usepgbouncer 8.5.2版本提供的图表,即pgbouncer=false,然后部署自己的pgbouncer(使用helm and kubecetl等),并在气流values.yaml 文件指向externalDatabase 主机到pgbouncer 服务。我们选择了这种方法:
$ helm repo add cradlepoint https://raw.githubusercontent.com/cradlepoint/kubernetes-helm-chart-pgbouncer/master/repos/stable --force-update
$ helm upgrade --install pgbouncer cradlepoint/pgbouncer -n $_namespace_airflow -f $some_path/values.pgbouncer.yaml

$ service_pgbouncer=$(kubectl get services -n airflow |grep pgbouncer |awk 'print $1')
$ echo "use this name: '$service_pgbouncer' in values.yaml for airflow externalDatabase"

您可以让values.pgbouncer.yaml 用于 azure postgresql auth 类型。例如trust(这是我们使用 azure side car image pgbouncer 时的值)。 对于为什么我们不能使用 azure side-car pgbouncer, 看: https://github.com/airflow-helm/charts/issues/464#issuecomment-973811581

    仍然使用pgbouncer内置的airflow community chart 8.5.2版本,但是对chart的部署使用了不同的方法。 (基本在本地修复chart pgbouncer硬编码auth_type问题,从本地固定副本部署chart)。请看以下 2 个对话:
https://github.com/airflow-helm/charts/issues/412#issuecomment-974909150 https://github.com/airflow-helm/charts/issues/412#issuecomment-974957815

上面的“974957815”评论是我意识到我可以做到的。

【讨论】:

@Julian Dm,见上面的总结。

以上是关于气流 2.1.3 使用 pgbouncer 解决 postgresql 问题的主要内容,如果未能解决你的问题,请参考以下文章

使用PgBouncer连接池

设置 GUC 参数或使用 PGOPTIONS 环境变量与 PgBouncer

PgBouncer 和 PostgreSQL 的身份验证

pgbouncer 和 scram-sha-256 设置

多个端口上的 pgBouncer?

linux ---pgbouncer的安装和配置