记录MySQL因Waiting for query cache lock导致业务宕机的处理

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了记录MySQL因Waiting for query cache lock导致业务宕机的处理相关的知识,希望对你有一定的参考价值。

早上10点,业务反馈服务器502无法访问,查询所有的WEB服务器、缓存、数据库都是可以连接的。但是mysql查询一致卡住,为尽快恢复业务,直接重启数据库操作,此时业务恢复。

开始找原因:
1.查看mysql错误日志,都是这样的。

2020-07-08 10:10:14 0 [Note] Event Scheduler: Killing the scheduler thread, thread id 7
2020-07-08 10:10:14 0 [Note] Event Scheduler: Waiting for the scheduler thread to reply
2020-07-08 10:10:14 0 [Note] InnoDB: FTS optimize thread exiting.
2020-07-08 10:10:16 0 [Note] Event Scheduler: Killing the scheduler thread, thread id 7
2020-07-08 10:10:16 0 [Note] Event Scheduler: Waiting for the scheduler thread to reply
2020-07-08 10:10:18 0 [Note] Event Scheduler: Killing the scheduler thread, thread id 7
2020-07-08 10:10:18 0 [Note] Event Scheduler: Waiting for the scheduler thread to reply
2020-07-08 10:10:20 0 [Note] Event Scheduler: Killing the scheduler thread, thread id 7

经过分析不是以上引起的

到了12点的时候又出现问题了,这次在重启前查询show processlist并且导出了文件
mysql -uroot -pxxxxx -N -e "show processlist" >/tmp/sp.log
分析导出的文件

2       system user             NULL    Daemon  NULL    InnoDB purge coordinator        NULL    0.000
3       system user             NULL    Daemon  NULL    InnoDB purge worker     NULL    0.000
1       system user             NULL    Daemon  NULL    InnoDB purge worker     NULL    0.000
4       system user             NULL    Daemon  NULL    InnoDB purge worker     NULL    0.000
5       system user             NULL    Daemon  NULL    InnoDB shutdown handler NULL    0.000
7       event_schbseuler        localhost       NULL    Daemon  6470    Waiting for next activation     NULL    0.000
94      replication     192.168.0.174:39478     NULL    Binlog Dump     6453    Master has sent all binlog to slave; waiting for binlog to be updatbse  NULL    0.000
95      replication     192.168.0.149:49262     NULL    Binlog Dump     6453    Master has sent all binlog to slave; waiting for binlog to be updatbse  NULL    0.000
27798   bseu    192.168.0.173:37820     dbusercollege   Query   348     Sending data    SELECT `p`.`Id`, `p`.`CId`, `p`.`CName`, `p`.`CodeId`, `p`.`CreateTime`, `p`.`IsDeletbse`, `p`.`IsEdi   0.000
27925   bseu    192.168.0.158:60228     dbusercollege   Query   349     Waiting for query cache lock    UPDATE `article` SET `ArType` = 1, `AuthorId` = ‘‘, `AuthorName` = ‘小奇‘, `AuthorPhoto` = ‘‘, `Co      0.000
27930   bseu    192.168.0.155:58588     dbusercollege   Query   268     Sending data    SELECT `a`.`Id`, `a`.`ArType`, `a`.`AuthorId`, `a`.`AuthorName`, `a`.`AuthorPhoto`, `a`.`Code`, `a`.    0.000
28589   bseu    192.168.0.173:40376     dbusercollege   Query   348     Sending data    SELECT `p`.`Id`, `p`.`CId`, `p`.`CName`, `p`.`CodeId`, `p`.`CreateTime`, `p`.`IsDeletbse`, `p`.`IsEdi   0.000
28779   bse     192.168.0.149:57342     bse     Query   398     Waiting for query cache lock    SELECT code,pcode FROM a_customer_relation      0.000
28780   bse     192.168.0.149:57344     bse     Sleep   398             NULL    0.000
28781   bse     192.168.0.149:57347     bse     Sleep   398             NULL    0.000
28782   bse     192.168.0.149:57346     bse     Sleep   398             NULL    0.000
28783   read    192.168.0.149:57350     dbuser  Query   398     Writing to net  SELECT cast((cr.`code`) as INT) as code,(CASE WHEN cr.`code`=7070 THEN 0 WHEN pcr.`code` IS NULL THE    0.000
28786   bsusr   192.168.0.160:34322     dbuser  Query   398     Update  INSERT INTO customer (level_up_time, create_time, identity_number, features, job, source, state, typ    0.000
28799   bsusr   192.168.0.173:48238     dbuser  Query   391     Waiting for query cache lock    INSERT INTO customer_product_browse (views, date_time, create_time, customer_id, org_id, shopkeeper_    0.000
28810   bsusr   192.168.0.173:48266     dbuser  Query   386     Waiting for query cache lock    DELETE FROM user_device WHERE channel = ‘jPush‘ AND bindUser_id = 3092570 AND device_id = ‘android_1    0.000
28825   bsusr   192.168.0.173:48294     dbuser  Query   378     Waiting for query cache lock    INSERT INTO customer_product_browse (views, date_time, create_time, customer_id, org_id, shopkeeper_    0.000
28830   bsusr   192.168.0.160:34430     dbuser  Query   377     Waiting for query cache lock    INSERT INTO customer_product_browse (views, date_time, create_time, customer_id, org_id, shopkeeper_    0.000
28831   bsusr   192.168.0.158:35156     dbuser  Query   376     Waiting for query cache lock    UPDATE shopkeeper SET shopkeeper_auth_time = 1594180592 WHERE id = 10539        0.000
28832   bsusr   192.168.0.173:48328     dbuser  Query   375     Waiting for query cache lock    UPDATE shopkeeper SET shopkeeper_auth_time = 1594180593 WHERE id = 36873        0.000
28844   bse     192.168.0.158:56890     bse     Execute 365     Waiting for query cache lock    INSERT INTO `t_shoping_cart` (`create_time` , `update_time` , `user_id` , `shop_id` , `product_id` ,    0.000
28855   bsusr   192.168.0.160:34482     dbuser  Query   360     Waiting for query cache lock    UPDATE shopkeeper SET shopkeeper_auth_time = 1594180608 WHERE id = 51318        0.000
28869   bsusr   192.168.0.173:48476     dbuser  Query   357     Waiting for query cache lock    UPDATE shopkeeper SET shopkeeper_auth_time = 1594180611 WHERE id = 52191        0.000
28872   bse     192.168.0.173:37620     bse     Execute 356     Waiting for query cache lock    INSERT INTO `t_shoping_cart` (`create_time` , `update_time` , `user_id` , `shop_id` , `product_id` ,    0.000
28874   bse     192.168.0.155:50136     bse     Execute 356     Waiting for query cache lock    INSERT INTO `t_shoping_cart` (`create_time` , `update_time` , `user_id` , `shop_id` , `product_id` ,    0.000

分析可以看到出现很多的Waiting for query cache lock,经过搜索一番发现如下:

1.query_cache的lock是一个全局的锁,当写入和读取并发大的时候,瓶颈就在这个锁。

打开Qcache对读写都会带来额外的消耗:
a.读查询开始之前必须检查是否命中缓存
b.如果读查询可以缓存,则执行完写入缓存
c.写入数据时,必须将相关联的所有表的缓存设置为是失效,如果缓存大,则消耗也大,可将系统僵死一段时间,因为这个操作是靠全局锁来保护

对innodb表,当修改一个表时,设置了缓存失效,但是多版本特性会暂时将这个修改对其他事务屏蔽,在提交这个事务之前所有查询都无法使用缓存,直到事务被提交,所以长时间的事务将大大降低查询缓存的命中率

查询的缓存空间不能设置过大,因为查询缓存靠全局锁保护。如果查询缓存配置的内存较大且存放了大量的查询结果,当查询缓存试下的时候,会长时间的持有这个全局锁。因为查询缓存的命中检测操作以及缓存失效检测也都依赖于这个全局锁,所以可能导致系统僵死的情况

结论:
1.根据业务对于大量写的业务,避免使用查询缓存
2.即便是用了查询缓存也不要设置太大

1.如果查看查询缓存的使用状态

MariaDB [(none)]> show status like ‘%Qcache%‘;
+-------------------------+--------+
| Variable_name           | Value  |
+-------------------------+--------+
| Qcache_free_blocks      | 0      |          #查询缓存中可用的块,如果这个值很大,但是Qcache_lowmem_prunes这个又多,说明查询缓存碎片太多了,调整合适的query_cache_min_res_unit数量,暂时处理可以使用flush query cache来整理碎片(可能会导致系统僵死),
| Qcache_free_memory      | 0      |       #查询缓存可用空间
| Qcache_hits             | 420977 |          #查询缓存命中次数
| Qcache_inserts          | 69901  |         #未命中写入次数
| Qcache_lowmem_prunes    | 0      |   #因空间不足导致被销毁的缓存
| Qcache_not_cached       | 4374   |     #因query_cache_type的设置导致未被缓存的次数
| Qcache_queries_in_cache | 0      |    #当前缓存中缓存的查询数量
| Qcache_total_blocks     | 0      |         #查询缓存总块数
+-------------------------+--------+

2.如何查看服务器的query_cache配置

MariaDB [(none)]> show variables like ‘%query_cache%‘;
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| have_query_cache             | YES     |     
| query_cache_limit            | 8388608 |    #允许单条query的缓存大小,超过的不会被缓存
| query_cache_min_res_unit     | 4096    |   #设置查询缓存每次分配内存的最小空间大小
| query_cache_size             | 0       |    #设置查询缓存所使用的内存大小,默认为0,大小必须是1024的整数倍
| query_cache_strip_comments   | OFF     |
| query_cache_type             | ON      |  #控制查询缓存控制的开关0(OFF)、1(ON)、2(DEMAND)
| query_cache_wlock_invalidate | OFF     | #控制当有写锁定发生在表上的时刻是否先失效该表相关的 Query Cache,如果设置为 1(TRUE),则在写锁定的同时将失效该表相关的所有 Query Cache,如果设置为0(FALSE)则在锁定时刻仍然允许读取该表相关的 Query Cache。
+------------------------------+---------+

最后我的处理方案是:

set global query_cache_size=0;

另外在配置文件中注释query_cache_size这行

以上是关于记录MySQL因Waiting for query cache lock导致业务宕机的处理的主要内容,如果未能解决你的问题,请参考以下文章

架构师技能4:深入MySQL原理-Waiting for table metadata lock引发系统崩溃

架构师技能6:深入MySQL原理-Waiting for table metadata lock引发系统崩溃

架构师技能6:深入MySQL原理-Waiting for table metadata lock引发系统崩溃

MySQL进程处于Waiting for table flush的分析

MySQL线程处于Waiting for table flush的分析

MySQL线程处于Waiting for table flush的分析