关于ceph rgw storage_class 的使用研究(amazon S3 智能分层 )

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了关于ceph rgw storage_class 的使用研究(amazon S3 智能分层 )相关的知识,希望对你有一定的参考价值。

参考技术A 为什么要研究这个?
因为rgw 没找到横向扩容的比较好的办法。有些人是在rgw 上层再加一个接入层,在上面加数据记录。比如一个“虚拟大bucket”,对应了下面多个集群的多个 bucket 。
无论哪一种办法,都要增加额外的元数据管理系统。
看到amazon 的这个 storage class 被ceph rgw (Nautilus)支持后
我打算研究一下利用这个新功能做到几件事
1 横向 在bucket 下扩pool
2 通过 bucket 下同时支持多个 pool,提高读写吞吐。
3 利用 生命周期。前置ssd pool 。达到时间后把对象迁移到后面的廉价 COLD pool 比如 大容量sata 。

ceph 官网文档 地址 https://docs.ceph.com/docs/master/radosgw/placement/

这个功能 amazon s3 在 2018年推出

发布于: Nov 26, 2018

S3 智能分层是一种新的 Amazon S3 存储类,专为希望在数据访问模式发生变化时自动优化存储成本而不会影响性能或运营开销的客户而设计。S3 智能分层是第一个云对象存储类,通过在访问模式发生变化时在两个访问层(频繁访问层和不频繁访问层)之间移动数据来实现自动节省成本,非常适用于访问模式未知或不断变化的数据。

ceph 官方在rgw Nautilus 版本中引入

首先说说 palcement 和 storage class 的区别

placement 是指 bucket 的放置属性 ,storage class 是bucket 内每个对象的放置属性。

placement 下面默认有个标准层 STANDARD 他对应的pool 默认default.rgw.buckets.data (这个pool 可修改为你想存放的 pool,)
"STANDARD":
"data_pool": “default.rgw.buckets.data”

每个 placemetn 都有个 STANDARD
你可以 添加自定义分层 比如 COLD 不只限于一个。可以多个。

上图是我测试的placement 加的2个class。 可以分别对应不同的pool 。
pool 可以根据你的需要新建在不同的设备上 比如 ssd sas sata
根据我们以前的测试结果

bucket ==> placement storage class ==>pool
创建bucket 通过指定 placement 制定了 放置池组
PUT 对象时候 可以制定 storage_class 指定具体的pool

下面说说具体做法 ,我就在默认 的 default-placement操作测试

根据官网命令 To add a new storage class named COLD to the default-placement target, start by adding it to the zonegroup

先在 zonegroup 加入 tag
1)
$ radosgw-admin zonegroup placement add
--rgw-zonegroup default
--placement-id default-placement
--storage-class COLD
2)zone 中加入 具体 pool 官网例子加上了压缩。这个可以根据需要配置是否加上
radosgw-admin zone placement add
--rgw-zone default
--placement-id default-placement
--storage-class COLD
--data-pool default.rgw.cold.data
--compression lz4

结果如下

placement_pools": [

"key": "default-placement”,
"val":
"index_pool": “default.rgw.buckets.index”,
"storage_classes":

我们用 s3cmd 测试上传
对我的 测试placement 做put 文件测试 (这里我用自定义 的placemet 做测试

指定 -storage-class=TEMPCOLD

s3cmd put cirros-0.3.5-x86_64-disk.img s3://bucket2/clodtest1 --storage-class=TEMPCOLD
upload: 'cirros-0.3.5-x86_64-disk.img' -> ' s3://bucket2/clodtest1' [1 of 1]

s3cmd info s3://bucket2/clodtest1
s3://bucket2/clodtest1 (object):
File size: 13267968
Last mod: Sun, 29 Mar 2020 07:03:34 GMT
MIME type: application/octet-stream
Storage: TEMPCOLD
MD5 sum: f8ab98ff5e73ebab884d80c9dc9c7290

如果不加参数

s3cmd put cirros-0.3.5-x86_64-disk.img s3://bucket2/clodtest3
upload: 'cirros-0.3.5-x86_64-disk.img' -> ' s3://bucket2/clodtest3' [1 of 1]
13267968 of 13267968 100% in 0s 27.25 MB/s done

数据会落在 STANDARD

s3://bucket2/clodtest3 (object):
File size: 13267968
Last mod: Sun, 29 Mar 2020 07:06:24 GMT
MIME type: application/octet-stream
Storage: STANDARD

经过测试 读数据不需要知道对象属于哪一个 Storage


=====
官方说明
所有放置目标都有一个STANDARD存储类,默认情况下该存储类适用于新对象。用户可以使用覆盖此默认设置 default_storage_class。

要在非默认存储类中创建对象,请在请求的HTTP标头中提供该存储类名称。S3协议使用 X-Amz-Storage-Class标头,而Swift协议使用 X-Object-Storage-Class标头。

结论:
1
如果需要 提高性能。可以同时在一个placement 下加入多个 Storage 对应多个 pool。 客户端写数据的时候可以均衡指定 storage-class
具体可以参考s3 api .
或者可以在nginx 接入层做 灵活的指定 X-Amz-Storage-Class (对多个 storage-class 做自定义轮询或者打开关闭)
2
如果需要在pool 将近满了。可以新增一个 storage-class。 客户端读写 指定到新的 storage-class。
3
如果需要用ssd 加速。可以 用ssd pool 作为STANDARD
用 廉价 sata 作为 COLD 进行迁移。 具体要研究 Lifecycle 的设置

我这里就没时间做测试了。

目前我测试的 ceph 是 14.2.5. 不是最新。 感觉这个功能还有写莫名其妙的情况。
建议各位有需要的可以做大规模的测试。

ceph——rgw服务启不起来

环境:SUSE SESv5版本——对应社区ceph的L版本(12.2)
故障背景:在给ceph集群扩充第四个节点的时候,运行到stage4,报错:
sesadmin:~ # salt-run state.orch ceph.stage.4
openattic                : valid
[ERROR   ] Run failed on minions: sesnode3.ses5.com
Failures:
    ----------
              ID: wait for rgw processes
        Function: module.run
            Name: cephprocesses.wait
          Result: False
         Comment: Module function cephprocesses.wait executed
         Started: 15:51:13.725345
        Duration: 135585.3 ms
         Changes:   
                  ----------
                  ret:
                      False
    
    Summary for sesnode3.ses5.com
    ------------
    Succeeded: 0 (changed=1)
    Failed:    1
    ------------
    Total states run:     1
    Total run time: 135.585 s
 
  Name: mine.send - Function: salt.function - Result: Changed Started: - 15:49:35.968349 Duration: 1490.601 ms
  Name: igw config - Function: salt.state - Result: Changed Started: - 15:49:37.459622 Duration: 13781.381 ms
  Name: auth - Function: salt.state - Result: Changed Started: - 15:49:51.241587 Duration: 5595.701 ms
  Name: keyring - Function: salt.state - Result: Clean Started: - 15:49:56.837756 Duration: 743.186 ms
  Name: sysconfig - Function: salt.state - Result: Clean Started: - 15:49:57.581125 Duration: 748.137 ms
  Name: iscsi import - Function: salt.state - Result: Changed Started: - 15:49:58.329640 Duration: 3324.991 ms
  Name: iscsi apply - Function: salt.state - Result: Clean Started: - 15:50:01.655630 Duration: 3166.979 ms
  Name: wait until sesnode2.ses5.com with role igw can be restarted - Function: salt.state - Result: Changed Started: - 15:50:04.823089 Duration: 6881.116 ms
  Name: check if igw processes are still running on sesnode2.ses5.com after restarting igws - Function: salt.state - Result: Changed Started: - 15:50:11.704379 Duration: 1696.99 ms
  Name: restarting igw on sesnode2.ses5.com - Function: salt.state - Result: Changed Started: - 15:50:13.401560 Duration: 6705.876 ms
  Name: wait until sesnode3.ses5.com with role igw can be restarted - Function: salt.state - Result: Changed Started: - 15:50:20.107781 Duration: 6779.647 ms
  Name: check if igw processes are still running on sesnode3.ses5.com after restarting igws - Function: salt.state - Result: Changed Started: - 15:50:26.887635 Duration: 773.069 ms
  Name: restarting igw on sesnode3.ses5.com - Function: salt.state - Result: Changed Started: - 15:50:27.660939 Duration: 6693.656 ms
  Name: cephfs pools - Function: salt.state - Result: Changed Started: - 15:50:34.354917 Duration: 6740.383 ms
  Name: mds auth - Function: salt.state - Result: Changed Started: - 15:50:41.095912 Duration: 4639.248 ms
  Name: mds - Function: salt.state - Result: Clean Started: - 15:50:45.735744 Duration: 1607.979 ms
  Name: mds restart noop - Function: test.nop - Result: Clean Started: - 15:50:47.344850 Duration: 0.525 ms
  Name: rgw auth - Function: salt.state - Result: Changed Started: - 15:50:47.345595 Duration: 4674.037 ms
  Name: rgw users - Function: salt.state - Result: Changed Started: - 15:50:52.019892 Duration: 4806.13 ms
  Name: rgw - Function: salt.state - Result: Changed Started: - 15:50:56.826321 Duration: 3165.723 ms
  Name: setup prometheus rgw exporter - Function: salt.state - Result: Changed Started: - 15:50:59.992275 Duration: 2910.979 ms
  Name: rgw demo buckets - Function: salt.state - Result: Clean Started: - 15:51:02.903422 Duration: 3480.446 ms
  Name: wait until sesnode3.ses5.com with role rgw can be restarted - Function: salt.state - Result: Changed Started: - 15:51:06.384012 Duration: 6823.43 ms
----------
          ID: check if rgw processes are still running on sesnode3.ses5.com after restarting rgws
    Function: salt.state
      Result: False
     Comment: Run failed on minions: sesnode3.ses5.com
              Failures:
                  ----------
                            ID: wait for rgw processes
                      Function: module.run
                          Name: cephprocesses.wait
                        Result: False
                       Comment: Module function cephprocesses.wait executed
                       Started: 15:51:13.725345
                      Duration: 135585.3 ms
                       Changes:   
                                ----------
                                ret:
                                    False
                  
                  Summary for sesnode3.ses5.com
                  ------------
                  Succeeded: 0 (changed=1)
                  Failed:    1
                  ------------
                  Total states run:     1
                  Total run time: 135.585 s
     Started: 15:51:13.207613
    Duration: 136141.45 ms
     Changes:   
 
-------------
Succeeded: 23 (changed=17)
Failed:     1
-------------
Total states run:     24
Total run time:  233.372 s
sesadmin:~ #
排查思路:
在该节点上,手动启动RGW服务:
sesnode3:~ # systemctl start [email protected]
sesnode3:~ # systemctl status [email protected]
● [email protected] - Ceph rados gateway
   Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2018-08-29 16:26:35 CST; 1h 39min ago
  Process: 103198 ExecStart=/usr/bin/radosgw -f --cluster ${CLUSTER} --name client.%i --setuser ceph --setgroup ceph (code=exited, status=5)
Main PID: 103198 (code=exited, status=5)
 
Aug 29 16:26:34 sesnode3 systemd[1]: [email protected]: Mai...ED
Aug 29 16:26:34 sesnode3 systemd[1]: [email protected]: Uni...e.
Aug 29 16:26:34 sesnode3 systemd[1]: [email protected]: Fai...‘.
Aug 29 16:26:35 sesnode3 systemd[1]: [email protected]: Ser...t.
Aug 29 16:26:35 sesnode3 systemd[1]: Stopped Ceph rados gateway.
Aug 29 16:26:35 sesnode3 systemd[1]: [email protected]: Sta...y.
Aug 29 16:26:35 sesnode3 systemd[1]: Failed to start Ceph rados gateway.
Aug 29 16:26:35 sesnode3 systemd[1]: [email protected]: Uni...e.
Aug 29 16:26:35 sesnode3 systemd[1]: [email protected]: Fai...‘.
Hint: Some lines were ellipsized, use -l to show in full.
 
 
但是依旧起不来,去查看日志
sesnode3:~ # tail -f /var/log/ceph/ceph-client.rgw.sesnode3.log
2018-08-29 16:26:33.330111 7f23d5511e00  0 deferred set uid:gid to 167:167 (ceph:ceph)
2018-08-29 16:26:33.330338 7f23d5511e00  0 ceph version 12.2.5-419-g8cbf63d997 (8cbf63d997fb5cdc783fe7bfcd4f5032ee140c0c) luminous (stable), process (unknown), pid 103128
2018-08-29 16:26:33.929287 7f23d5511e00  0 starting handler: civetweb
2018-08-29 16:26:33.929609 7f23d5511e00  0 civetweb: 0x5636c0c9e2e0: cannot listen to 80: 98 (Address already in use)
2018-08-29 16:26:33.936396 7f23d5511e00 -1 ERROR: failed run
2018-08-29 16:26:34.363426 7f5d467d9e00  0 deferred set uid:gid to 167:167 (ceph:ceph)
2018-08-29 16:26:34.363551 7f5d467d9e00  0 ceph version 12.2.5-419-g8cbf63d997 (8cbf63d997fb5cdc783fe7bfcd4f5032ee140c0c) luminous (stable), process (unknown), pid 103198
2018-08-29 16:26:34.817776 7f5d467d9e00  0 starting handler: civetweb
2018-08-29 16:26:34.818183 7f5d467d9e00  0 civetweb: 0x55c06880c2e0: cannot listen to 80: 98 (Address already in use)
2018-08-29 16:26:34.818351 7f5d467d9e00 -1 ERROR: failed run
 
查看ceph.conf:
sesnode3:/etc/ceph # cat ceph.conf
# DeepSea default configuration. Changes in this file will be overwritten on
# package update. Include custom configuration fragments in
# /srv/salt/ceph/configuration/files/ceph.conf.d/[global,osd,mon,mgr,mds,client].conf
[global]
fsid = 82499237-e7fe-32cf-b47f-4117d2b8e63a
mon_initial_members = sesnode2, sesnode3, sesnode1
mon_host = 192.168.120.82, 192.168.120.83, 192.168.120.81
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
public_network = 192.168.120.0/24
cluster_network = 192.168.125.0/24
 
# enable old ceph health format in the json output. This fixes the
# ceph_exporter. This option will only stay until the prometheus plugin takes
# over
mon_health_preluminous_compat = true
mon health preluminous compat warning = false
 
rbd default features = 3
 
 
 
 
[client.rgw.sesnode3]
rgw frontends = "civetweb port=80"
rgw dns name = sesnode3.ses5.com
rgw enable usage log = true
 
 
 
 
[osd]
 
 
[mon]
 
 
[mgr]
 
 
[mds]
 
 
[client]
 
 
注:很奇怪,用ps命令去查看,80端口没有被使用,但是报错是80端口被占用了,为了省事所以直接一点将端口改掉:
sesnode3:/etc/ceph # ps -ef |grep 80
root         380       1  0 Aug28 ?        00:00:01 /usr/lib/systemd/systemd-journald
root        2062       2  0 Aug28 ?        00:00:00 [cfg80211]
root      108046  107895  0 18:05 pts/1    00:00:00 tail -f /var/log/ceph/ceph-client.rgw.sesnode3.log
root      111617  107698  0 18:42 pts/0    00:00:00 grep --color=auto 80
sesnode3:/etc/ceph # netstat -ano |grep 80
tcp        0      0 192.168.125.83:6800     0.0.0.0:*               LISTEN      off (0.00/0/0)
tcp        0      0 192.168.120.83:6800     0.0.0.0:*               LISTEN      off (0.00/0/0)
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      off (0.00/0/0)
tcp        0      0 192.168.120.83:6801     0.0.0.0:*               LISTEN      off (0.00/0/0)
tcp        0      0 192.168.125.83:6801     0.0.0.0:*               LISTEN      off (0.00/0/0)
tcp        0      0 0.0.0.0:8480            0.0.0.0:*               LISTEN      off (0.00/0/0)
tcp        0      0 192.168.120.83:6801     192.168.120.81:42728    ESTABLISHED off (0.00/0/0)
tcp        0      0 192.168.120.83:35610    192.168.120.80:4506     TIME_WAIT   timewait (56.80/0/0)
tcp        0      0 192.168.120.83:35608    192.168.120.80:4506     TIME_WAIT   timewait (56.06/0/0)
tcp        0      0 192.168.120.83:45428    192.168.120.81:6803     ESTABLISHED off (0.00/0/0)
 
 
[client.rgw.sesnode3]
rgw frontends = "civetweb port=8480"
rgw dns name = sesnode3.ses5.com
rgw enable usage log = true
 
启动服务,服务正常。

以上是关于关于ceph rgw storage_class 的使用研究(amazon S3 智能分层 )的主要内容,如果未能解决你的问题,请参考以下文章

ceph-rgw之indexless

Ceph使用系列之——Ceph RGW使用

如何使用ansible部署ceph-rgw

Ceph部署RGW搭建

对象存储网关RADOS Gateway(RGW)

第⑦讲:Ceph集群RGW对象存储核心概念及部署使用