Kubernetes 上的 Greenplum
Posted
技术标签:
【中文标题】Kubernetes 上的 Greenplum【英文标题】:Greenplum on kubernetes 【发布时间】:2020-09-23 09:15:18 【问题描述】:我在 kubernetes 上部署了一个略带绿色的集群。
一切似乎都在运行:
$ kubectl get pods:
NAME READY STATUS RESTARTS AGE
greenplum-operator-588d8fcfd8-nmgjp 1/1 Running 0 40m
svclb-greenplum-krdtd 1/1 Running 0 39m
svclb-greenplum-k28bv 1/1 Running 0 39m
svclb-greenplum-25n7b 1/1 Running 0 39m
segment-a-0 1/1 Running 0 39m
master-0 1/1 Running 0 39m
不过,由于集群状态为 Pending
,因此似乎有些问题:
$ kubectl describe greenplumclusters.greenplum.pivotal.io my-greenplum
Name: my-greenplum
Namespace: default
Labels: <none>
Annotations: <none>
API Version: greenplum.pivotal.io/v1
Kind: GreenplumCluster
Metadata:
Creation Timestamp: 2020-09-23T08:31:04Z
Finalizers:
stopcluster.greenplumcluster.pivotal.io
Generation: 2
Managed Fields:
API Version: greenplum.pivotal.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:masterAndStandby:
.:
f:antiAffinity:
f:cpu:
f:hostBasedAuthentication:
f:memory:
f:standby:
f:storage:
f:storageClassName:
f:workerSelector:
f:segments:
.:
f:antiAffinity:
f:cpu:
f:memory:
f:mirrors:
f:primarySegmentCount:
f:storage:
f:storageClassName:
f:workerSelector:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2020-09-23T08:31:04Z
API Version: greenplum.pivotal.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
f:status:
.:
f:instanceImage:
f:operatorVersion:
f:phase:
Manager: greenplum-operator
Operation: Update
Time: 2020-09-23T08:31:11Z
Resource Version: 590
Self Link: /apis/greenplum.pivotal.io/v1/namespaces/default/greenplumclusters/my-greenplum
UID: 72ed72a8-4dd9-48fb-8a48-de2229d88a24
Spec:
Master And Standby:
Anti Affinity: no
Cpu: 0.5
Host Based Authentication: # host all gpadmin 0.0.0.0/0 trust
Memory: 800Mi
Standby: no
Storage: 1G
Storage Class Name: local-path
Worker Selector:
Segments:
Anti Affinity: no
Cpu: 0.5
Memory: 800Mi
Mirrors: no
Primary Segment Count: 1
Storage: 2G
Storage Class Name: local-path
Worker Selector:
Status:
Instance Image: registry.localhost:5000/greenplum-for-kubernetes:v2.2.0
Operator Version: registry.localhost:5000/greenplum-operator:v2.2.0
Phase: Pending
Events: <none>
如你所见:
阶段:待定
我查看了操作员日志:
"level":"DEBUG","ts":"2020-09-23T09:12:18.494Z","logger":"PodExec","msg":"master-0 is not active master","namespace":"default","error":"command terminated with exit code 2"
"level":"DEBUG","ts":"2020-09-23T09:12:18.497Z","logger":"PodExec","msg":"master-1 is not active master","namespace":"default","error":"pods \"master-1\" not found"
"level":"DEBUG","ts":"2020-09-23T09:12:18.497Z","logger":"controllers.GreenplumCluster","msg":"current active master","greenplumcluster":"default/my-greenplum","activeMaster":""
我不太明白它们的意思......
我的意思是,它似乎在寻找两个大师:master-0
和master-1
。正如你在下面看到的,我只部署了一个带有一个段的主节点。
greenplum 集群清单是:
apiVersion: "greenplum.pivotal.io/v1"
kind: "GreenplumCluster"
metadata:
name: my-greenplum
spec:
masterAndStandby:
hostBasedAuthentication: |
# host all gpadmin 0.0.0.0/0 trust
memory: "800Mi"
cpu: "0.5"
storageClassName: local-path
storage: 1G
workerSelector:
segments:
primarySegmentCount: 1
memory: "800Mi"
cpu: "0.5"
storageClassName: local-path
storage: 2G
workerSelector:
主人正在记录这个:
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Starting Master instance master-0 directory /greenplum/data-1
20200923:11:29:12:001380 gpstart:master-0:gpadmin-[INFO]:-Command pg_ctl reports Master master-0 instance active
20200923:11:29:12:001380 gpstart:master-0:gpadmin-[INFO]:-Connecting to dbname='template1' connect_timeout=15
20200923:11:29:27:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 1/4
20200923:11:29:42:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 2/4
20200923:11:29:57:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 3/4
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 4/4
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[WARNING]:-Failed to connect to template1
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[INFO]:-No standby master configured. skipping...
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[INFO]:-Check status of database with gpstate utility
20200923:11:30:12:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Completed restart of Greenplum instance in production mode
简而言之:
连接到模板1超时
完整的 master-0 日志:
*******************************
Initializing Greenplum for Kubernetes Cluster
*******************************
*******************************
Generating gpinitsystem_config
*******************************
"level":"INFO","ts":"2020-09-23T11:28:58.394Z","logger":"startGreenplumContainer","msg":"initializing Greenplum Cluster"
Sub Domain for the cluster is: agent.greenplum-1.svc.cluster.local
*******************************
Running gpinitsystem
*******************************
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking configuration parameters, please wait...
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Locale has not been set in , will set to default value
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Locale set to en_US.utf8
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[WARN]:-ARRAY_NAME variable not set, will provide default value
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[WARN]:-Master hostname master-0.agent.greenplum-1.svc.cluster.local does not match hostname output
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking to see if master-0.agent.greenplum-1.svc.cluster.local can be resolved on this host
Warning: Permanently added the RSA host key for IP address '10.42.2.5' to the list of known hosts.
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Can resolve master-0.agent.greenplum-1.svc.cluster.local to this host
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-No DATABASE_NAME set, will exit following template1 updates
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[WARN]:-CHECK_POINT_SEGMENTS variable not set, will set to default value
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[WARN]:-ENCODING variable not set, will set to default UTF-8
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-MASTER_MAX_CONNECT not set, will set to default value 250
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Detected a single host GPDB array build, reducing value of BATCH_DEFAULT from 60 to 4
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking configuration parameters, Completed
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking Master host
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking new segment hosts, please wait...
Warning: Permanently added the RSA host key for IP address '10.42.1.5' to the list of known hosts.
"level":"DEBUG","ts":"2020-09-23T11:28:59.038Z","logger":"DNS resolver","msg":"resolved DNS entry","host":"segment-a-0"
"level":"INFO","ts":"2020-09-23T11:28:59.038Z","logger":"keyscanner","msg":"starting keyscan","host":"segment-a-0"
20200923:11:28:59:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking new segment hosts, Completed
"level":"INFO","ts":"2020-09-23T11:28:59.064Z","logger":"keyscanner","msg":"keyscan successful","host":"segment-a-0"
20200923:11:28:59:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Building the Master instance database, please wait...
20200923:11:29:02:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Found more than 1 instance of shared_preload_libraries in /greenplum/data-1/postgresql.conf, will append
20200923:11:29:02:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Starting the Master in admin mode
20200923:11:29:03:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Commencing parallel build of primary segment instances
20200923:11:29:03:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Spawning parallel processes batch [1], please wait...
.
20200923:11:29:03:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Waiting for parallel processes batch [1], please wait...
......
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:------------------------------------------------
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Parallel process exit status
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:------------------------------------------------
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Total processes marked as completed = 1
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Total processes marked as killed = 0
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Total processes marked as failed = 0
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:------------------------------------------------
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Deleting distributed backout files
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Removing back out file
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-No errors generated from parallel processes
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Restarting the Greenplum instance in production mode
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Starting gpstop with args: -a -l /home/gpadmin/gpAdminLogs -m -d /greenplum/data-1
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Gathering information and validating the environment...
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Obtaining Segment details from master...
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 6.10.1 build commit:efba04ce26ebb29b535a255a5e95d1f5ebfde94e'
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart'
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Master segment instance directory=/greenplum/data-1
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Stopping master segment and waiting for user connections to finish ...
server shutting down
20200923:11:29:10:001357 gpstop:master-0:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process
20200923:11:29:10:001357 gpstop:master-0:gpadmin-[INFO]:-Terminating processes for segment /greenplum/data-1
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Starting gpstart with args: -a -l /home/gpadmin/gpAdminLogs -d /greenplum/data-1
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Gathering information and validating the environment...
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 6.10.1 build commit:efba04ce26ebb29b535a255a5e95d1f5ebfde94e'
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Greenplum Catalog Version: '301908232'
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Starting Master instance in admin mode
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Obtaining Segment details from master...
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Setting new master era
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Master Started...
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Shutting down master
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Process results...
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-----------------------------------------------------
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:- Successful segment starts = 1
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:- Failed segment starts = 0
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:- Skipped segment starts (segments are marked down in configuration) = 0
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-----------------------------------------------------
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Successfully started 1 of 1 segment instances
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-----------------------------------------------------
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Starting Master instance master-0 directory /greenplum/data-1
20200923:11:29:12:001380 gpstart:master-0:gpadmin-[INFO]:-Command pg_ctl reports Master master-0 instance active
20200923:11:29:12:001380 gpstart:master-0:gpadmin-[INFO]:-Connecting to dbname='template1' connect_timeout=15
20200923:11:29:27:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 1/4
20200923:11:29:42:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 2/4
20200923:11:29:57:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 3/4
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1, attempt 4/4
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[WARNING]:-Failed to connect to template1
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[INFO]:-No standby master configured. skipping...
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[INFO]:-Check status of database with gpstate utility
20200923:11:30:12:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Completed restart of Greenplum instance in production mode
有什么想法吗?
【问题讨论】:
【参考方案1】:这些天我在 kubernetes 上部署了 greenplum。
我的问题是 cgroup 目录的权限。当我查看 Pod 中 /greenplum/data1/pg_log/ 下的文件时,我发现它会打印诸如“无法访问目录”/sys/fs/cgroup/memory/gpdb/ 之类的错误。因为 Pod 使用了hostPath
。
我的建议是在 /greenplum/data1 下的文件中打印错误/pg_log/。
Pod 的日志并不是全部事实。
顺便说一句,我最后使用了 v0.8.0。我首先选择v2.3.0,但是master准备好后很快就被杀死了,可能是Docker。日志就像“收到快速关闭请求。 ic-proxy-server: 收到信号 15'
【讨论】:
以上是关于Kubernetes 上的 Greenplum的主要内容,如果未能解决你的问题,请参考以下文章
Kubernetes 上的 Spark + Zeppelin