记录MySQL因Waiting for query cache lock导致业务宕机的处理
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了记录MySQL因Waiting for query cache lock导致业务宕机的处理相关的知识,希望对你有一定的参考价值。
早上10点,业务反馈服务器502无法访问,查询所有的WEB服务器、缓存、数据库都是可以连接的。但是mysql查询一致卡住,为尽快恢复业务,直接重启数据库操作,此时业务恢复。
开始找原因:
1.查看mysql错误日志,都是这样的。
2020-07-08 10:10:14 0 [Note] Event Scheduler: Killing the scheduler thread, thread id 7
2020-07-08 10:10:14 0 [Note] Event Scheduler: Waiting for the scheduler thread to reply
2020-07-08 10:10:14 0 [Note] InnoDB: FTS optimize thread exiting.
2020-07-08 10:10:16 0 [Note] Event Scheduler: Killing the scheduler thread, thread id 7
2020-07-08 10:10:16 0 [Note] Event Scheduler: Waiting for the scheduler thread to reply
2020-07-08 10:10:18 0 [Note] Event Scheduler: Killing the scheduler thread, thread id 7
2020-07-08 10:10:18 0 [Note] Event Scheduler: Waiting for the scheduler thread to reply
2020-07-08 10:10:20 0 [Note] Event Scheduler: Killing the scheduler thread, thread id 7
经过分析不是以上引起的
到了12点的时候又出现问题了,这次在重启前查询show processlist并且导出了文件
mysql -uroot -pxxxxx -N -e "show processlist" >/tmp/sp.log
分析导出的文件
2 system user NULL Daemon NULL InnoDB purge coordinator NULL 0.000
3 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
1 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
4 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
5 system user NULL Daemon NULL InnoDB shutdown handler NULL 0.000
7 event_schbseuler localhost NULL Daemon 6470 Waiting for next activation NULL 0.000
94 replication 192.168.0.174:39478 NULL Binlog Dump 6453 Master has sent all binlog to slave; waiting for binlog to be updatbse NULL 0.000
95 replication 192.168.0.149:49262 NULL Binlog Dump 6453 Master has sent all binlog to slave; waiting for binlog to be updatbse NULL 0.000
27798 bseu 192.168.0.173:37820 dbusercollege Query 348 Sending data SELECT `p`.`Id`, `p`.`CId`, `p`.`CName`, `p`.`CodeId`, `p`.`CreateTime`, `p`.`IsDeletbse`, `p`.`IsEdi 0.000
27925 bseu 192.168.0.158:60228 dbusercollege Query 349 Waiting for query cache lock UPDATE `article` SET `ArType` = 1, `AuthorId` = ‘‘, `AuthorName` = ‘小奇‘, `AuthorPhoto` = ‘‘, `Co 0.000
27930 bseu 192.168.0.155:58588 dbusercollege Query 268 Sending data SELECT `a`.`Id`, `a`.`ArType`, `a`.`AuthorId`, `a`.`AuthorName`, `a`.`AuthorPhoto`, `a`.`Code`, `a`. 0.000
28589 bseu 192.168.0.173:40376 dbusercollege Query 348 Sending data SELECT `p`.`Id`, `p`.`CId`, `p`.`CName`, `p`.`CodeId`, `p`.`CreateTime`, `p`.`IsDeletbse`, `p`.`IsEdi 0.000
28779 bse 192.168.0.149:57342 bse Query 398 Waiting for query cache lock SELECT code,pcode FROM a_customer_relation 0.000
28780 bse 192.168.0.149:57344 bse Sleep 398 NULL 0.000
28781 bse 192.168.0.149:57347 bse Sleep 398 NULL 0.000
28782 bse 192.168.0.149:57346 bse Sleep 398 NULL 0.000
28783 read 192.168.0.149:57350 dbuser Query 398 Writing to net SELECT cast((cr.`code`) as INT) as code,(CASE WHEN cr.`code`=7070 THEN 0 WHEN pcr.`code` IS NULL THE 0.000
28786 bsusr 192.168.0.160:34322 dbuser Query 398 Update INSERT INTO customer (level_up_time, create_time, identity_number, features, job, source, state, typ 0.000
28799 bsusr 192.168.0.173:48238 dbuser Query 391 Waiting for query cache lock INSERT INTO customer_product_browse (views, date_time, create_time, customer_id, org_id, shopkeeper_ 0.000
28810 bsusr 192.168.0.173:48266 dbuser Query 386 Waiting for query cache lock DELETE FROM user_device WHERE channel = ‘jPush‘ AND bindUser_id = 3092570 AND device_id = ‘android_1 0.000
28825 bsusr 192.168.0.173:48294 dbuser Query 378 Waiting for query cache lock INSERT INTO customer_product_browse (views, date_time, create_time, customer_id, org_id, shopkeeper_ 0.000
28830 bsusr 192.168.0.160:34430 dbuser Query 377 Waiting for query cache lock INSERT INTO customer_product_browse (views, date_time, create_time, customer_id, org_id, shopkeeper_ 0.000
28831 bsusr 192.168.0.158:35156 dbuser Query 376 Waiting for query cache lock UPDATE shopkeeper SET shopkeeper_auth_time = 1594180592 WHERE id = 10539 0.000
28832 bsusr 192.168.0.173:48328 dbuser Query 375 Waiting for query cache lock UPDATE shopkeeper SET shopkeeper_auth_time = 1594180593 WHERE id = 36873 0.000
28844 bse 192.168.0.158:56890 bse Execute 365 Waiting for query cache lock INSERT INTO `t_shoping_cart` (`create_time` , `update_time` , `user_id` , `shop_id` , `product_id` , 0.000
28855 bsusr 192.168.0.160:34482 dbuser Query 360 Waiting for query cache lock UPDATE shopkeeper SET shopkeeper_auth_time = 1594180608 WHERE id = 51318 0.000
28869 bsusr 192.168.0.173:48476 dbuser Query 357 Waiting for query cache lock UPDATE shopkeeper SET shopkeeper_auth_time = 1594180611 WHERE id = 52191 0.000
28872 bse 192.168.0.173:37620 bse Execute 356 Waiting for query cache lock INSERT INTO `t_shoping_cart` (`create_time` , `update_time` , `user_id` , `shop_id` , `product_id` , 0.000
28874 bse 192.168.0.155:50136 bse Execute 356 Waiting for query cache lock INSERT INTO `t_shoping_cart` (`create_time` , `update_time` , `user_id` , `shop_id` , `product_id` , 0.000
分析可以看到出现很多的Waiting for query cache lock,经过搜索一番发现如下:
1.query_cache的lock是一个全局的锁,当写入和读取并发大的时候,瓶颈就在这个锁。
打开Qcache对读写都会带来额外的消耗:
a.读查询开始之前必须检查是否命中缓存
b.如果读查询可以缓存,则执行完写入缓存
c.写入数据时,必须将相关联的所有表的缓存设置为是失效,如果缓存大,则消耗也大,可将系统僵死一段时间,因为这个操作是靠全局锁来保护
对innodb表,当修改一个表时,设置了缓存失效,但是多版本特性会暂时将这个修改对其他事务屏蔽,在提交这个事务之前所有查询都无法使用缓存,直到事务被提交,所以长时间的事务将大大降低查询缓存的命中率
查询的缓存空间不能设置过大,因为查询缓存靠全局锁保护。如果查询缓存配置的内存较大且存放了大量的查询结果,当查询缓存试下的时候,会长时间的持有这个全局锁。因为查询缓存的命中检测操作以及缓存失效检测也都依赖于这个全局锁,所以可能导致系统僵死的情况
结论:
1.根据业务对于大量写的业务,避免使用查询缓存
2.即便是用了查询缓存也不要设置太大
1.如果查看查询缓存的使用状态
MariaDB [(none)]> show status like ‘%Qcache%‘;
+-------------------------+--------+
| Variable_name | Value |
+-------------------------+--------+
| Qcache_free_blocks | 0 | #查询缓存中可用的块,如果这个值很大,但是Qcache_lowmem_prunes这个又多,说明查询缓存碎片太多了,调整合适的query_cache_min_res_unit数量,暂时处理可以使用flush query cache来整理碎片(可能会导致系统僵死),
| Qcache_free_memory | 0 | #查询缓存可用空间
| Qcache_hits | 420977 | #查询缓存命中次数
| Qcache_inserts | 69901 | #未命中写入次数
| Qcache_lowmem_prunes | 0 | #因空间不足导致被销毁的缓存
| Qcache_not_cached | 4374 | #因query_cache_type的设置导致未被缓存的次数
| Qcache_queries_in_cache | 0 | #当前缓存中缓存的查询数量
| Qcache_total_blocks | 0 | #查询缓存总块数
+-------------------------+--------+
2.如何查看服务器的query_cache配置
MariaDB [(none)]> show variables like ‘%query_cache%‘;
+------------------------------+---------+
| Variable_name | Value |
+------------------------------+---------+
| have_query_cache | YES |
| query_cache_limit | 8388608 | #允许单条query的缓存大小,超过的不会被缓存
| query_cache_min_res_unit | 4096 | #设置查询缓存每次分配内存的最小空间大小
| query_cache_size | 0 | #设置查询缓存所使用的内存大小,默认为0,大小必须是1024的整数倍
| query_cache_strip_comments | OFF |
| query_cache_type | ON | #控制查询缓存控制的开关0(OFF)、1(ON)、2(DEMAND)
| query_cache_wlock_invalidate | OFF | #控制当有写锁定发生在表上的时刻是否先失效该表相关的 Query Cache,如果设置为 1(TRUE),则在写锁定的同时将失效该表相关的所有 Query Cache,如果设置为0(FALSE)则在锁定时刻仍然允许读取该表相关的 Query Cache。
+------------------------------+---------+
最后我的处理方案是:
set global query_cache_size=0;
另外在配置文件中注释query_cache_size这行
以上是关于记录MySQL因Waiting for query cache lock导致业务宕机的处理的主要内容,如果未能解决你的问题,请参考以下文章
架构师技能4:深入MySQL原理-Waiting for table metadata lock引发系统崩溃
架构师技能6:深入MySQL原理-Waiting for table metadata lock引发系统崩溃
架构师技能6:深入MySQL原理-Waiting for table metadata lock引发系统崩溃
MySQL进程处于Waiting for table flush的分析