Kubernetes 上的 Greenplum

Posted

技术标签:

【中文标题】Kubernetes 上的 Greenplum【英文标题】:Greenplum on kubernetes 【发布时间】:2020-09-23 09:15:18 【问题描述】:

我在 kubernetes 上部署了一个略带绿色的集群。

一切似乎都在运行:

$ kubectl get pods:
NAME                                  READY   STATUS    RESTARTS   AGE
greenplum-operator-588d8fcfd8-nmgjp   1/1     Running   0          40m
svclb-greenplum-krdtd                 1/1     Running   0          39m
svclb-greenplum-k28bv                 1/1     Running   0          39m
svclb-greenplum-25n7b                 1/1     Running   0          39m
segment-a-0                           1/1     Running   0          39m
master-0                              1/1     Running   0          39m

不过,由于集群状态为 Pending,因此似乎有些问题:

$ kubectl describe greenplumclusters.greenplum.pivotal.io my-greenplum
Name:         my-greenplum
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  greenplum.pivotal.io/v1
Kind:         GreenplumCluster
Metadata:
  Creation Timestamp:  2020-09-23T08:31:04Z
  Finalizers:
    stopcluster.greenplumcluster.pivotal.io
  Generation:  2
  Managed Fields:
    API Version:  greenplum.pivotal.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:masterAndStandby:
          .:
          f:antiAffinity:
          f:cpu:
          f:hostBasedAuthentication:
          f:memory:
          f:standby:
          f:storage:
          f:storageClassName:
          f:workerSelector:
        f:segments:
          .:
          f:antiAffinity:
          f:cpu:
          f:memory:
          f:mirrors:
          f:primarySegmentCount:
          f:storage:
          f:storageClassName:
          f:workerSelector:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2020-09-23T08:31:04Z
    API Version:  greenplum.pivotal.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
      f:status:
        .:
        f:instanceImage:
        f:operatorVersion:
        f:phase:
    Manager:         greenplum-operator
    Operation:       Update
    Time:            2020-09-23T08:31:11Z
  Resource Version:  590
  Self Link:         /apis/greenplum.pivotal.io/v1/namespaces/default/greenplumclusters/my-greenplum
  UID:               72ed72a8-4dd9-48fb-8a48-de2229d88a24
Spec:
  Master And Standby:
    Anti Affinity:              no
    Cpu:                        0.5
    Host Based Authentication:  # host   all   gpadmin   0.0.0.0/0   trust

    Memory:              800Mi
    Standby:             no
    Storage:             1G
    Storage Class Name:  local-path
    Worker Selector:
  Segments:
    Anti Affinity:          no
    Cpu:                    0.5
    Memory:                 800Mi
    Mirrors:                no
    Primary Segment Count:  1
    Storage:                2G
    Storage Class Name:     local-path
    Worker Selector:
Status:
  Instance Image:    registry.localhost:5000/greenplum-for-kubernetes:v2.2.0
  Operator Version:  registry.localhost:5000/greenplum-operator:v2.2.0
  Phase:             Pending
Events:              <none>

如你所见:

阶段:待定

我查看了操作员日志:

"level":"DEBUG","ts":"2020-09-23T09:12:18.494Z","logger":"PodExec","msg":"master-0 is not active master","namespace":"default","error":"command terminated with exit code 2"
"level":"DEBUG","ts":"2020-09-23T09:12:18.497Z","logger":"PodExec","msg":"master-1 is not active master","namespace":"default","error":"pods \"master-1\" not found"
"level":"DEBUG","ts":"2020-09-23T09:12:18.497Z","logger":"controllers.GreenplumCluster","msg":"current active master","greenplumcluster":"default/my-greenplum","activeMaster":""

我不太明白它们的意思......

我的意思是,它似乎在寻找两个大师:master-0master-1。正如你在下面看到的,我只部署了一个带有一个段的主节点。

greenplum 集群清单是:

apiVersion: "greenplum.pivotal.io/v1"
kind: "GreenplumCluster"
metadata:
  name: my-greenplum
spec:
  masterAndStandby:
    hostBasedAuthentication: |
      # host   all   gpadmin   0.0.0.0/0   trust
    memory: "800Mi"
    cpu: "0.5"
    storageClassName: local-path
    storage: 1G
    workerSelector: 
  segments:
    primarySegmentCount: 1
    memory: "800Mi"
    cpu: "0.5"
    storageClassName: local-path
    storage: 2G
    workerSelector: 

主人正在记录这个:

20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Starting Master instance master-0 directory /greenplum/data-1 
20200923:11:29:12:001380 gpstart:master-0:gpadmin-[INFO]:-Command pg_ctl reports Master master-0 instance active
20200923:11:29:12:001380 gpstart:master-0:gpadmin-[INFO]:-Connecting to dbname='template1' connect_timeout=15
20200923:11:29:27:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 1/4
20200923:11:29:42:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 2/4
20200923:11:29:57:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 3/4
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 4/4
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[WARNING]:-Failed to connect to template1
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[INFO]:-No standby master configured.  skipping...
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[INFO]:-Check status of database with gpstate utility
20200923:11:30:12:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Completed restart of Greenplum instance in production mode

简而言之:

连接到模板1超时

完整的 master-0 日志:

*******************************
Initializing Greenplum for Kubernetes Cluster
*******************************
*******************************
Generating gpinitsystem_config
*******************************
"level":"INFO","ts":"2020-09-23T11:28:58.394Z","logger":"startGreenplumContainer","msg":"initializing Greenplum Cluster"
Sub Domain for the cluster is: agent.greenplum-1.svc.cluster.local
*******************************
Running gpinitsystem
*******************************
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking configuration parameters, please wait...
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Locale has not been set in , will set to default value
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Locale set to en_US.utf8
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[WARN]:-ARRAY_NAME variable not set, will provide default value
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[WARN]:-Master hostname master-0.agent.greenplum-1.svc.cluster.local does not match hostname output
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking to see if master-0.agent.greenplum-1.svc.cluster.local can be resolved on this host
Warning: Permanently added the RSA host key for IP address '10.42.2.5' to the list of known hosts.
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Can resolve master-0.agent.greenplum-1.svc.cluster.local to this host
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-No DATABASE_NAME set, will exit following template1 updates
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[WARN]:-CHECK_POINT_SEGMENTS variable not set, will set to default value
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[WARN]:-ENCODING variable not set, will set to default UTF-8
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-MASTER_MAX_CONNECT not set, will set to default value 250
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Detected a single host GPDB array build, reducing value of BATCH_DEFAULT from 60 to 4
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking configuration parameters, Completed
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking Master host
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking new segment hosts, please wait...
Warning: Permanently added the RSA host key for IP address '10.42.1.5' to the list of known hosts.
"level":"DEBUG","ts":"2020-09-23T11:28:59.038Z","logger":"DNS resolver","msg":"resolved DNS entry","host":"segment-a-0"
"level":"INFO","ts":"2020-09-23T11:28:59.038Z","logger":"keyscanner","msg":"starting keyscan","host":"segment-a-0"

20200923:11:28:59:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking new segment hosts, Completed
"level":"INFO","ts":"2020-09-23T11:28:59.064Z","logger":"keyscanner","msg":"keyscan successful","host":"segment-a-0"
20200923:11:28:59:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Building the Master instance database, please wait...
20200923:11:29:02:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Found more than 1 instance of shared_preload_libraries in /greenplum/data-1/postgresql.conf, will append
20200923:11:29:02:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Starting the Master in admin mode
20200923:11:29:03:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Commencing parallel build of primary segment instances
20200923:11:29:03:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Spawning parallel processes    batch [1], please wait...
.
20200923:11:29:03:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Waiting for parallel processes batch [1], please wait...
......
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:------------------------------------------------
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Parallel process exit status
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:------------------------------------------------
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Total processes marked as completed           = 1
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Total processes marked as killed              = 0
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Total processes marked as failed              = 0
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:------------------------------------------------
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Deleting distributed backout files
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Removing back out file
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-No errors generated from parallel processes
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Restarting the Greenplum instance in production mode
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Starting gpstop with args: -a -l /home/gpadmin/gpAdminLogs -m -d /greenplum/data-1
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Gathering information and validating the environment...
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Obtaining Segment details from master...
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 6.10.1 build commit:efba04ce26ebb29b535a255a5e95d1f5ebfde94e'
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart'
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Master segment instance directory=/greenplum/data-1
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Stopping master segment and waiting for user connections to finish ...
server shutting down
20200923:11:29:10:001357 gpstop:master-0:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process
20200923:11:29:10:001357 gpstop:master-0:gpadmin-[INFO]:-Terminating processes for segment /greenplum/data-1
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Starting gpstart with args: -a -l /home/gpadmin/gpAdminLogs -d /greenplum/data-1
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Gathering information and validating the environment...
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 6.10.1 build commit:efba04ce26ebb29b535a255a5e95d1f5ebfde94e'
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Greenplum Catalog Version: '301908232'
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Starting Master instance in admin mode
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Obtaining Segment details from master...
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Setting new master era
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Master Started...
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Shutting down master
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Process results...
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-----------------------------------------------------
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-   Successful segment starts                                            = 1
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-----------------------------------------------------
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Successfully started 1 of 1 segment instances 
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-----------------------------------------------------
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Starting Master instance master-0 directory /greenplum/data-1 
20200923:11:29:12:001380 gpstart:master-0:gpadmin-[INFO]:-Command pg_ctl reports Master master-0 instance active
20200923:11:29:12:001380 gpstart:master-0:gpadmin-[INFO]:-Connecting to dbname='template1' connect_timeout=15
20200923:11:29:27:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 1/4
20200923:11:29:42:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 2/4
20200923:11:29:57:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 3/4
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 4/4
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[WARNING]:-Failed to connect to template1
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[INFO]:-No standby master configured.  skipping...
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[INFO]:-Check status of database with gpstate utility
20200923:11:30:12:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Completed restart of Greenplum instance in production mode

有什么想法吗?

【问题讨论】:

【参考方案1】:
这些天我在 kubernetes 上部署了 greenplum。

我的问题是 cgroup 目录的权限。当我查看 Pod 中 /greenplum/data1/pg_log/ 下的文件时,我发现它会打印诸如“无法访问目录”/sys/fs/cgroup/memory/gpdb/ 之类的错误。因为 Pod 使用了hostPath

我的建议是在 /greenplum/data1 下的文件中打印错误/pg_log/。

Pod 的日志并不是全部事实。

顺便说一句,我最后使用了 v0.8.0。我首先选择v2.3.0,但是master准备好后很快就被杀死了,可能是Docker。日志就像“收到快速关闭请求。 ic-proxy-server: 收到信号 15'

【讨论】:

以上是关于Kubernetes 上的 Greenplum的主要内容,如果未能解决你的问题,请参考以下文章

Kubernetes 上的 Spark + Zeppelin

Kubernetes 上的 Rook 和 ceph

分布式系统在 Kubernetes 上的进化

NFS 卷位于 kubernetes 节点上的啥位置?

NFS (Kubernetes) 上的 SonarQube 插件目录

远程访问 EC2 上的 Kubernetes 仪表板