无法部署具有大于一个副本的卷的 mongodb StatefulSet
Posted
技术标签:
【中文标题】无法部署具有大于一个副本的卷的 mongodb StatefulSet【英文标题】:Cannot deploy mongodb StatefulSet with volumes for replicas grater than one 【发布时间】:2019-12-22 14:25:00 【问题描述】:上下文
我正在共享 /data/db
目录,该目录作为网络文件系统卷安装在 StatefulSet 控制的所有 pod 中。
问题
当我设置replicas: 1
stateful set 时正确部署了 mongodb。当我扩大规模时问题就开始了(副本数量大于一个,例如replicas: 2
)
所有连续的 pod 都具有CrashLoopBackOff
状态。
问题
我了解错误消息 - 检查下面的调试部分。但是,我不明白。基本上,我试图实现的是 mongodb 的有状态部署,所以即使
删除 pod 后,它们将保留数据。不知何故,mongo 阻止了我这样做,因为Another mongod instance is already running on the /data/db director
。
我的问题是:我做错了什么?如何部署 mongodb 使其成为有状态并持久化数据,同时扩展有状态集?
调试
集群状态
$ kubectl get svc,sts,po,pv,pvc --output=wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/mongo ClusterIP None <none> 27017/TCP 10h run=mongo
NAME READY AGE CONTAINERS IMAGES
statefulset.apps/mongo 1/2 8m50s mongo mongo:4.2.0-bionic
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/mongo-0 1/1 Running 0 8m50s 10.44.0.2 web01 <none> <none>
pod/mongo-1 0/1 CrashLoopBackOff 6 8m48s 10.36.0.3 compute01 <none> <none>
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE VOLUMEMODE
persistentvolume/phenex-nfs-mongo 1Gi RWX Retain Bound phenex-nfs-mongo 22m Filesystem
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE
persistentvolumeclaim/phenex-nfs-mongo Bound phenex-nfs-mongo 1Gi RWX 22m Filesystem
日志
$ kubectl logs -f mongo-1
2019-08-14T23:52:30.632+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=mongo-1
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] db version v4.2.0
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] git version: a4b751dcf51dd249c5865812b390cfd1c0129c30
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] allocator: tcmalloc
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] modules: none
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] build environment:
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] distmod: ubuntu1804
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] distarch: x86_64
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] target_arch: x86_64
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] options: net: bindIp: "0.0.0.0" , replication: replSet: "rs0"
2019-08-14T23:52:30.642+0000 I STORAGE [initandlisten] exception in initAndListen: DBPathInUse: Unable to lock the lock file: /data/db/mongod.lock (Resource temporarily unavailable). Another mongod instance is already running on the /data/db directory, terminating
2019-08-14T23:52:30.643+0000 I NETWORK [initandlisten] shutdown: going to close listening sockets...
2019-08-14T23:52:30.643+0000 I NETWORK [initandlisten] removing socket file: /tmp/mongodb-27017.sock
2019-08-14T23:52:30.643+0000 I - [initandlisten] Stopping further Flow Control ticket acquisitions.
2019-08-14T23:52:30.643+0000 I CONTROL [initandlisten] now exiting
2019-08-14T23:52:30.643+0000 I CONTROL [initandlisten] shutting down with code:100
错误
Unable to lock the lock file: /data/db/mongod.lock (Resource temporarily unavailable).
Another mongod instance is already running on the /data/db directory, terminating
YAML 文件
# StatefulSet
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: mongo
spec:
serviceName: mongo
replicas: 2
selector:
matchLabels:
run: mongo
tier: backend
template:
metadata:
labels:
run: mongo
tier: backend
spec:
terminationGracePeriodSeconds: 10
containers:
- name: mongo
image: mongo:4.2.0-bionic
command:
- mongod
args:
- "--replSet=rs0"
- "--bind_ip=0.0.0.0"
ports:
- containerPort: 27017
volumeMounts:
- name: phenex-nfs-mongo
mountPath: /data/db
volumes:
- name: phenex-nfs-mongo
persistentVolumeClaim:
claimName: phenex-nfs-mongo
# PersistentVolume
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: phenex-nfs-mongo
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 1Gi
nfs:
server: master
path: /nfs/data/phenex/production/permastore/mongo
claimRef:
name: phenex-nfs-mongo
persistentVolumeReclaimPolicy: Retain
# PersistentVolumeClaim
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: phenex-nfs-mongo
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Mi
【问题讨论】:
【参考方案1】:问题:
您正在使用相同的 pvc 和 pv 部署多个 pod。
解决方案:
使用volumeClaimTemplates
、example
示例:
# StatefulSet
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: mongo
spec:
serviceName: mongo
replicas: 2
selector:
matchLabels:
run: mongo
tier: backend
template:
metadata:
labels:
run: mongo
tier: backend
spec:
terminationGracePeriodSeconds: 10
containers:
- name: mongo
image: mongo:4.2.0-bionic
command:
- mongod
args:
- "--replSet=rs0"
- "--bind_ip=0.0.0.0"
ports:
- containerPort: 27017
volumeMounts:
- name: phenex-nfs-mongo
mountPath: /data/db
volumeClaimTemplates:
- metadata:
name: phenex-nfs-mongo
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Mi
【讨论】:
You are deploying more than one pod using the same pvc and pv.
别误会我的意思。但这正是我想要实现的。我想在所有 pod 之间共享 db director,所以 Stateful Set
中的所有 pod 都将具有相同的内容 - 因此,我的 pod 将是有状态的。为什么这是个问题?这与使用 nginx 部署的所有 pod 之间共享 index.html
的用例相同。非常有义务解释:)
使用 MongoDB 无法实现这一点,因为如果为多个 pod 挂载一个 pv,它必须是只读的,而 MongoDB 需要对存储的写访问权限。
如果我错了,请纠正我。但是,我通过在 pv 和 pvc 上将 accessModes
选项设置为 ReadWriteMany
来为我的节点指定写访问权限。所以理论上我应该能够读写共享卷。
理论上是的,但是 MongoDB 在/data/db
目录中创建了一些文件,并且能够发现另一个实例已经在使用相同的存储空间
您能否指出我如何正确部署 mongo 作为具有持久数据的有状态集的资源?谢谢!以上是关于无法部署具有大于一个副本的卷的 mongodb StatefulSet的主要内容,如果未能解决你的问题,请参考以下文章
高可用MongoDB集群部署详解——搭建MongoDB副本集
MongoDB副本集(一主两从)读写分离故障转移功能环境部署记录