其实ceph存储是底层的规范,应该在部署kubernetes集群前就准备好的,使他为k8s集群提供存储服务。可以用来存储pod,docker镜像,日志数据等
ceph概述
Ceph 是一个分布式存储系统,独一无二地用统一的系统—Ceph 存储集群,提供了对象存储,块存储和文件存储三种功能。Ceph 的存储集群基于 RADOS,提供了极大伸缩性—供成千用户访问 PB 乃至 EB 级的数据。 Ceph 节点以普通硬件和智能守护进程作为支撑点, Ceph 存储集群组织起了大量节点,它们之间靠相互通讯来复制数据、同时采用 CRUSH 算法动态地重分布数据。
Ceph 有很多术语,了解这些术语,对理解 Ceph 的体系结构是非常重要的。Ceph 的常见术语。
名词 | 解释 |
---|---|
RADOSGW | 对象网关守护进程 |
RBD | 块存储 |
CEPHFS | 文件存储 |
LIBRADOS | 和 RADOS 交互的基本库 librados,Ceph 通过原生协议和 RADOS 交互,Ceph 把这种功能封装进了 librados 库,这样你就能定制自己的客户端 |
RADOS | 存储集群 |
OSD | Object Storage Device,RADOS 的组件,用于存储资源 |
Monitor | 监视器,RADOS 的组件,维护整个 Ceph 集群的全局状态 |
MDS | Ceph 元数据服务器,为 Ceph 文件系统存储元数据 |
环境节点规划
Ceph分布式存储集群由若干组件组成,包括:Ceph Monitor
、Ceph OSD
和Ceph MDS
,其中如果你仅使用对象存储和块存储时,MDS不是必须的,仅当你要用到Cephfs时,MDS才是需要安装的。我们这需要安装MDS。
CephRBD是否支持多Pod同时挂载呢?官方文档中给出了否定的答案: 基于CephRBD的Persistent Volume仅支持两种accessmode:ReadWriteOnce和ReadOnlyMany,不支持ReadWriteMany。
Ceph的安装模型与k8s有些类似,也是通过一个deploy node远程操作其他Node以create、prepare和activate各个Node上的Ceph组件
资源有限,在k8s集群节点中部署ceph集群,后面的hosts还是沿用k8s集群的,可能会有些难识别
节点 name | 主机名 | 节点 IP | 配置 | 说明 |
---|---|---|---|---|
ceph-mon-0 | node-01 | 172.24.10.20 | centos7.4 | 管理节点,监视器 monitor,mds |
ceph-mon-1 | node-02 | 172.24.10.21 | centos7.4 | 监视器 monitor,mds,client |
ceph-mon-2 | node-03 | 172.24.10.22 | centos7.4 | 监视器 monitor,mds |
ceph-osd-0 | node-01 | 172.24.10.20 | 20G | 存储节点 osd |
ceph-osd-1 | node-02 | 172.24.10.21 | 20G | 存储节点 osd |
ceph-osd-2 | node-03 | 172.24.10.22 | 20G | 存储节点 osd |
ceph-osd-3 | node-04 | 172.24.10.23 | 20G | 存储节点 osd |
ceph-osd-4 | node-05 | 172.24.10.24 | 20G | 存储节点 osd |
ceph-osd-5 | node-06 | 172.24.10.25 | 20G | 存储节点 osd |
集群部署
基础环境
hosts
~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.24.10.20 node-01
172.24.10.21 node-02
172.24.10.22 node-03
172.24.10.23 node-04
172.24.10.24 node-05
172.24.10.25 node-06
同时管理节点和其他节点做好ssh-key免密码登陆
所有节点导入ceph yum源
~]# yum install epel-release -y && yum upgrade -y
~]# rpm -Uvh https://download.ceph.com/rpm-luminous/el7/noarch/ceph-release-1-1.el7.noarch.rpm
~]# ansible ceph -a 'rpm -Uvh https://download.ceph.com/rpm-luminous/el7/noarch/ceph-release-1-1.el7.noarch.rpm' # 批量安装
# 官方源太慢,后面构建集群安装包各种超时
[Ceph]
name=Ceph packages for $basearch
baseurl=http://mirrors.aliyun.com/ceph/rpm-luminous/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-luminous/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[ceph-source]
name=Ceph source packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-luminous/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
# ansible ceph -m copy -a 'src=/etc/yum.repos.d/ceph.repo dest=/etc/yum.repos.d/ceph.repo'
所有节点添加ceph用户,并赋予sudo权限
# 创建密码
python -c "from passlib.hash import sha512_crypt; import getpass; print sha512_crypt.encrypt(getpass.getpass())"
Password: ceph
$6$rounds=656000$PZshbGs2TMKtUgB1$LTdZj9xxHsJH5wRNSLYQL8CH7bAaE4415g/aRZD39RJiRrPx.Bzu19Y5/aOqQuFUunr7griuDN7BAlcTOkuw81
# 本机sudo
visudo
Defaults:ceph timestamp_timeout=-1
ceph ALL=(root) NOPASSWD:ALL
# ansible创建用户密码yaml
~]# vim user.yml
- hosts: ceph
remote_user: root
tasks:
- name: add user
user: name=ceph password='$6$rounds=656000$PZshbGs2TMKtUgB1$LTdZj9xxHsJH5wRNSLYQL8CH7bAaE4415g/aRZD39RJiRrPx.Bzu19Y5/aOqQuFUunr7griuDN7BAlcTOkuw81'
- name: sudo config
copy: src=/etc/sudoers dest=/etc/sudoers
- name: sync ssh key
authorized_key: user=ceph state=present exclusive=yes key='{{lookup('file', '/home/ceph/.ssh/id_rsa.pub')}}'
# 运行playbook
ansible-playbook user.yml
部署集群
安装ceph-deploy
在管理节点进行
~]$ sudo yum install ceph-deploy
在管理节点,创建集群
~]$ mkdir ceph-cluster
~]$ cd ceph-cluster
ceph-cluster]$ ceph-deploy new node-01 node-02 node-03
更改生成的 ceph 集群配置文件
$ cat ceph.conf
[global]
fsid = 64960081-9cfe-4b6f-a9ae-eb9b2be216bc
mon_initial_members = node-01, node-02, node-03
mon_host = 172.24.10.20,172.24.10.21,172.24.10.22
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
#更改 osd 个数
osd pool default size = 6
[mon]
#允许 ceph 集群删除 pool
mon_allow_pool_delete = true
[mgr]
mgr modules = dashboard
所有节点安装 ceph
~]$ ceph-deploy install --no-adjust-repos node-01 node-02 node-03 node-04 node-05 node-06
# 不加--no-adjust-repos 会一直使用ceph-deploy提供的默认的源,很坑
初始化ceph monitor node
初始化mon,收集所有密钥
cd ceph-cluster/
ceph-deploy mon create-initial
报错1:
ceph-mon-2 Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon-2.asok mon_status
ceph-mon-2 admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
ceph_deploy.mon mon.ceph-mon-2 monitor is not yet in quorum, tries left: 1
ceph_deploy.mon waiting 20 seconds before retrying
ceph_deploy.mon Some monitors have still not reached quorum:
ceph_deploy.mon ceph-mon-0
ceph_deploy.mon ceph-mon-1
ceph_deploy.mon ceph-mon-2
# 查看/var/run/ceph目录
]$ ls /var/run/ceph/
ceph-mon.k8s-master-01.asok // 成节点的主机名的方式命名的
# 移除错误环境
]$ ceph-deploy mon destroy node-01 node-02 node-03
还是不行,主机名必须唯一
清理环境
$ ceph-deploy purge node-01 node-02 node-03 node-04 node-05 node-06 // 会移除所有与ceph相关的
$ ceph-deploy purgedata node-01 node-02 node-03 node-04 node-05 node-06
$ ceph-deploy forgetkeys
报错2
[node-03][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node-03.asok mon_status
[ceph_deploy.mon][WARNIN] mon.node-03 monitor is not yet in quorum, tries left: 5
[ceph_deploy.mon][WARNIN] waiting 5 seconds before retrying
[node-03][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node-03.asok mon_status
[ceph_deploy.mon][WARNIN] mon.node-03 monitor is not yet in quorum, tries left: 4
[ceph_deploy.mon][WARNIN] waiting 10 seconds before retrying
[node-03][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node-03.asok mon_status
[ceph_deploy.mon][WARNIN] mon.node-03 monitor is not yet in quorum, tries left: 3
[ceph_deploy.mon][WARNIN] waiting 10 seconds before retrying
[node-03][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node-03.asok mon_status
[ceph_deploy.mon][WARNIN] mon.node-03 monitor is not yet in quorum, tries left: 2
[ceph_deploy.mon][WARNIN] waiting 15 seconds before retrying
[node-03][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node-03.asok mon_status
[ceph_deploy.mon][WARNIN] mon.node-03 monitor is not yet in quorum, tries left: 1
[ceph_deploy.mon][WARNIN] waiting 20 seconds before retrying
[ceph_deploy.mon][ERROR ] Some monitors have still not reached quorum:
[ceph_deploy.mon][ERROR ] node-02
[ceph_deploy.mon][ERROR ] node-03
[ceph_deploy.mon][ERROR ] node-01
解决
iptables 策略未通过,可以清空规则,或者添加默认的监听端口6789
查看启动服务
$ ps -ef|grep ceph
ceph 4693 1 0 16:45 ? 00:00:00 /usr/bin/ceph-mon -f --cluster ceph --id node-01 --setuser ceph --setgroup ceph
# 手动停止方法
osd部署
在管理节点上登录到每个 osd 节点,创建 osd 节点的数据存储目录(老版本)
# osd-0
ssh node-01
sudo mkdir /var/local/osd0
sudo chown -R ceph.ceph /var/local/osd0
# osd-1
ssh node-02
sudo mkdir /var/local/osd1
sudo chown -R ceph.ceph /var/local/osd1
# osd-2
ssh node-03
sudo mkdir /var/local/osd2
sudo chown -R ceph.ceph /var/local/osd2
# osd-3
ssh node-04
sudo mkdir /var/local/osd3
sudo chown -R ceph.ceph /var/local/osd3
# osd-4
ssh node-05
sudo mkdir /var/local/osd4
sudo chown -R ceph.ceph /var/local/osd4
# osd-5
ssh node-06
sudo mkdir /var/local/osd5
sudo chown -R ceph.ceph /var/local/osd5
在管理节点上执行命令,使每个 osd 就绪(prepare)(老版本)
ceph-deploy osd prepare node-01:/var/local/osd0 node-02:/var/local/osd1 node-03:/var/local/osd2 node-04:/var/local/osd3 node-05:/var/local/osd4 node-06:/var/local/osd5
# --overwrite-conf
激活每个osd节点(老版本)
ceph-deploy osd activate node-01:/var/local/osd0 node-02:/var/local/osd1 node-03:/var/local/osd2 node-04:/var/local/osd3 node-05:/var/local/osd4 node-06:/var/local/osd5
添加激活osd磁盘(老版本的)
ceph-deploy osd create --bluestore node-01:/var/local/osd0 node-02:/var/local/osd1 node-03:/var/local/osd2 node-04:/var/local/osd3 node-05:/var/local/osd4 node-06:/var/local/osd5
新版ceph-deploy直接使用create
相当于prepare,activate,osd create --bluestore
ceph-deploy osd create --data /dev/sdb node-01
ceph-deploy osd create --data /dev/sdb node-02
ceph-deploy osd create --data /dev/sdb node-03
ceph-deploy osd create --data /dev/sdb node-04
ceph-deploy osd create --data /dev/sdb node-05
ceph-deploy osd create --data /dev/sdb node-06
在管理节点把配置文件和 admin 密钥拷贝到管理节点和 Ceph 节点
ceph-deploy admin node-01 node-02 node-03 node-04 node-05 node-06
在每个节点上赋予 ceph.client.admin.keyring 有操作权限
sudo ansible ceph -a 'chmod +r /etc/ceph/ceph.client.admin.keyring'
部署完成,查看集群状态
$ ceph -s
cluster:
id: 64960081-9cfe-4b6f-a9ae-eb9b2be216bc
health: HEALTH_WARN
clock skew detected on mon.node-02, mon.node-03
services:
mon: 3 daemons, quorum node-01,node-02,node-03
mgr: node-01(active), standbys: node-02, node-03
osd: 6 osds: 6 up, 6 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage: 6337 MB used, 113 GB / 119 GB avail
pgs:
health问题解决
health: HEALTH_WARN
clock skew detected on mon.node-02, mon.node-03
这个是时间同步造成的
$ sudo ansible ceph -a 'yum install ntpdate -y'
$ sudo ansible ceph -a 'systemctl stop ntpdate'
$ sudo ansible ceph -a 'ntpdate time.windows.com'
$ ceph -s
cluster:
id: 64960081-9cfe-4b6f-a9ae-eb9b2be216bc
health: HEALTH_OK
services:
mon: 3 daemons, quorum node-01,node-02,node-03
mgr: node-01(active), standbys: node-03, node-02
mds: cephfs-1/1/1 up {0=node-02=up:active}, 2 up:standby
osd: 6 osds: 6 up, 6 in
data:
pools: 2 pools, 192 pgs
objects: 21 objects, 2246 bytes
usage: 6354 MB used, 113 GB / 119 GB avail
pgs: 192 active+clean
查看状态
$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.11691 root default
-3 0.01949 host node-01
0 hdd 0.01949 osd.0 up 1.00000 1.00000
-5 0.01949 host node-02
1 hdd 0.01949 osd.1 up 1.00000 1.00000
-7 0.01949 host node-03
2 hdd 0.01949 osd.2 up 1.00000 1.00000
-9 0.01949 host node-04
3 hdd 0.01949 osd.3 up 1.00000 1.00000
-11 0.01949 host node-05
4 hdd 0.01949 osd.4 up 1.00000 1.00000
-13 0.01949 host node-06
5 hdd 0.01949 osd.5 up 1.00000 1.00000
查看挂载
$ df -Th
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/centos-root xfs 17G 1.5G 16G 9% /
devtmpfs devtmpfs 478M 0 478M 0% /dev
tmpfs tmpfs 488M 0 488M 0% /dev/shm
tmpfs tmpfs 488M 6.6M 482M 2% /run
tmpfs tmpfs 488M 0 488M 0% /sys/fs/cgroup
/dev/sda1 xfs 1014M 153M 862M 16% /boot
tmpfs tmpfs 98M 0 98M 0% /run/user/0
tmpfs tmpfs 488M 48K 488M 1% /var/lib/ceph/osd/ceph-0
]$ cat /var/lib/ceph/osd/ceph-0/type
bluestore
配置ceph-mgr
自从ceph 12开始,manager是必须的。应该为每个运行monitor的机器添加一个mgr,否则集群处于WARN状态。
$ ceph-deploy mgr create node-01 node-02 node-03
ceph config-key put mgr/dashboard/server_addr 172.24.10.20
ceph config-key put mgr/dashboard/server_port 7000
ceph mgr module enable dashboard
http://172.24.10.20:7000/
部署ceph-fs
http://docs.ceph.com/docs/master/rados/operations/placement-groups/
$ ceph-deploy mds create node-01 node-02 node-03
$ ceph osd pool create cephfs_data 128
$ ceph osd pool create cephfs_metadata 64
$ ceph fs new cephfs cephfs_metadata cephfs_data
$ ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
$ ceph mds stat
cephfs-1/1/1 up {0=node-02=up:active}, 2 up:standby
虽然支持多 active mds并行运行,但官方文档建议保持一个active mds,其他mds作为standby
挂载fs
client是规划在node2上的
在物理机上挂载cephfs可以使用mount命令、mount.ceph(apt-get install ceph-fs-common)或ceph-fuse(apt-get install ceph-fuse),我们先用mount命令挂载
$ sudo mkdir /data/ceph-storage/ -p
$ sudo chown -R ceph.ceph /data/ceph-storage
$ ceph-authtool -l /etc/ceph/ceph.client.admin.keyring
[client.admin]
key = AQAEKJFa54MlFRAAg76JDhpwlHD1F8J2G76baQ==
$ sudo mount -t ceph 172.24.10.21:6789:/ /data/ceph-storage/ -o name=admin,secret=AQAEKJFa54MlFRAAg76JDhpwlHD1F8J2G76baQ==
$ df -Th
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/centos-root xfs 17G 1.5G 16G 9% /
devtmpfs devtmpfs 478M 0 478M 0% /dev
tmpfs tmpfs 488M 0 488M 0% /dev/shm
tmpfs tmpfs 488M 6.7M 481M 2% /run
tmpfs tmpfs 488M 0 488M 0% /sys/fs/cgroup
/dev/sda1 xfs 1014M 153M 862M 16% /boot
tmpfs tmpfs 98M 0 98M 0% /run/user/0
tmpfs tmpfs 488M 48K 488M 1% /var/lib/ceph/osd/ceph-1
tmpfs tmpfs 98M 0 98M 0% /run/user/1000
172.24.10.21:6789:/ ceph 120G 6.3G 114G 6% /data/ceph-storage