Ceph: too many PGs per OSD

Posted 2020-10-20

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Ceph: too many PGs per OSD相关的知识，希望对你有一定的参考价值。

一、故障现象：

查看ceph的集群状态：too many PGs per OSD (698 > max 300)

# ceph -s
    cluster e2ca994a-00c4-477f-9390-ea3f931c5062
     health HEALTH_WARN
            too many PGs per OSD (698 > max 300)
     monmap e1: 3 mons at {hz-01-ops-tc-ceph-02=172.16.2.231:6789/0,hz-01-ops-tc-ceph-03=172.16.2.172:6789/0,hz-01-ops-tc-ceph-04=172.16.2.181:6789/0}
            election epoch 14, quorum 0,1,2 hz-01-ops-tc-ceph-03,hz-01-ops-tc-ceph-04,hz-01-ops-tc-ceph-02
     osdmap e54: 5 osds: 5 up, 5 in
            flags sortbitwise,require_jewel_osds
      pgmap v1670: 1164 pgs, 3 pools, 14640 kB data, 22 objects
            240 MB used, 224 GB / 224 GB avail
                1164 active+clean
# ceph --show-config  | grep mon_pg_warn_max_per_osd
mon_pg_warn_max_per_osd = 300

二、调整ceph配置信息

# cd /my-cluster
# vim ceph.conf 
添加如下参数：
mon_pg_warn_max_per_osd = 1024
# ceph-deploy --overwrite-conf config push hz-01-ops-tc-ceph-04 hz-01-ops-tc-ceph-02 hz-01-ops-tc-ceph-03
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.39): /usr/bin/ceph-deploy --overwrite-conf config push hz-01-ops-tc-ceph-04 hz-01-ops-tc-ceph-02 hz-01-ops-tc-ceph-03
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : True
[ceph_deploy.cli][INFO  ]  subcommand                    : push
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fb51b241320>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  client                        : ['hz-01-ops-tc-ceph-04', 'hz-01-ops-tc-ceph-02', 'hz-01-ops-tc-ceph-03']
[ceph_deploy.cli][INFO  ]  func                          : <function config at 0x7fb51bac8320>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.config][DEBUG ] Pushing config to hz-01-ops-tc-ceph-04
[hz-01-ops-tc-ceph-04][DEBUG ] connected to host: hz-01-ops-tc-ceph-04 
[hz-01-ops-tc-ceph-04][DEBUG ] detect platform information from remote host
[hz-01-ops-tc-ceph-04][DEBUG ] detect machine type
[hz-01-ops-tc-ceph-04][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to hz-01-ops-tc-ceph-02
[hz-01-ops-tc-ceph-02][DEBUG ] connected to host: hz-01-ops-tc-ceph-02 
[hz-01-ops-tc-ceph-02][DEBUG ] detect platform information from remote host
[hz-01-ops-tc-ceph-02][DEBUG ] detect machine type
[hz-01-ops-tc-ceph-02][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to hz-01-ops-tc-ceph-03
[hz-01-ops-tc-ceph-03][DEBUG ] connected to host: hz-01-ops-tc-ceph-03 
[hz-01-ops-tc-ceph-03][DEBUG ] detect platform information from remote host
[hz-01-ops-tc-ceph-03][DEBUG ] detect machine type
[hz-01-ops-tc-ceph-03][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

在mon节点上重启服务：

# systemctl restart ceph-mon.target

三、然后在管理节点上再次查看集群

# ceph -s
    cluster e2ca994a-00c4-477f-9390-ea3f931c5062
     health HEALTH_OK
     monmap e1: 3 mons at {hz-01-ops-tc-ceph-02=172.16.2.231:6789/0,hz-01-ops-tc-ceph-03=172.16.2.172:6789/0,hz-01-ops-tc-ceph-04=172.16.2.181:6789/0}
            election epoch 20, quorum 0,1,2 hz-01-ops-tc-ceph-03,hz-01-ops-tc-ceph-04,hz-01-ops-tc-ceph-02
     osdmap e54: 5 osds: 5 up, 5 in
            flags sortbitwise,require_jewel_osds
      pgmap v1779: 1164 pgs, 3 pools, 14640 kB data, 22 objects
            240 MB used, 224 GB / 224 GB avail
                1164 active+clean
# ceph --show-config  | grep mon_pg_warn_max_per_osd
mon_pg_warn_max_per_osd = 1024

以上是关于Ceph: too many PGs per OSD的主要内容，如果未能解决你的问题，请参考以下文章

CEPH -S集群报错TOO MANY PGS PER OSD

too few PGs per OSD#yyds干货盘点#

ceph （ pgs inconsistent） pgs不一致异常状态处理方式

nested exception is com.mongodb.MongoWaitQueueFullException: Too many operations are already waiting

ceph 2 pgs inconsistent故障

ceph集群报错：HEALTH_ERR 1 pgs inconsistent; 1 scrub errors