当 Hangfire 并行处理多个作业时,为啥 MySQL InnoDB 会产生如此多的死锁?

Posted

技术标签:

【中文标题】当 Hangfire 并行处理多个作业时,为啥 MySQL InnoDB 会产生如此多的死锁?【英文标题】:Why MySQL InnoDB creates so many deadlocks when Hangfire enques multiple jobs in parallel?当 Hangfire 并行处理多个作业时,为什么 MySQL InnoDB 会产生如此多的死锁? 【发布时间】:2018-12-19 15:32:37 【问题描述】:

在我的 asp.net 核心应用程序中,我将 Hangfire 与 mysql 数据库存储一起使用。我有一个端点,当访问它时,它会在后台安排一个挂火作业。当我对此端点进行负载测试时,如果我发送超过 40 个并发请求,则此代码 BackgroundJob.Schedule<IJobSchedulerCallbacks>(s => s.ScheduleSomeCode(); 开始引发以下异常:

Hangfire.BackgroundJobClientException: Background job creation failed. See inner exception for details. ---> MySql.Data.MySqlClient.MySqlException: Deadlock found when trying to get lock; try restarting transaction
at MySql.Data.MySqlClient.MySqlStream.ReadPacket()
at MySql.Data.MySqlClient.NativeDriver.GetResult(Int32& affectedRow, Int64& insertedId)
at MySql.Data.MySqlClient.Driver.NextResult(Int32 statementId, Boolean force)
at MySql.Data.MySqlClient.MySqlDataReader.NextResult()
at MySql.Data.MySqlClient.MySqlCommand.ExecuteReader(CommandBehavior behavior)
at MySql.Data.MySqlClient.MySqlCommand.ExecuteNonQuery()
at Dapper.SqlMapper.ExecuteCommand(IDbConnection cnn, CommandDefinition& command, Action`2 paramReader)
at Dapper.SqlMapper.ExecuteImpl(IDbConnection cnn, CommandDefinition& command)
at Dapper.SqlMapper.Execute(IDbConnection cnn, String sql, Object param, IDbTransaction transaction, Nullable`1 commandTimeout, Nullable`1 commandType)
at Hangfire.MySql.MySqlWriteOnlyTransaction.<>c__DisplayClass14_0.<AddToSet>b__0(MySqlConnection x)
at Hangfire.MySql.MySqlWriteOnlyTransaction.<Commit>b__29_0(MySqlConnection connection)
at Hangfire.MySql.MySqlStorage.<>c__DisplayClass18_0.<UseTransaction>b__0(MySqlConnection connection)
at Hangfire.MySql.MySqlStorage.UseConnection[T](Func`2 func)
at Hangfire.MySql.MySqlStorage.UseTransaction[T](Func`2 func, Nullable`1 isolationLevel)
at Hangfire.MySql.MySqlStorage.UseTransaction(Action`1 action)
at Hangfire.MySql.MySqlWriteOnlyTransaction.Commit()
at Hangfire.Client.CoreBackgroundJobFactory.Create(CreateContext context)
at Hangfire.Client.BackgroundJobFactory.<>c__DisplayClass7_0.<CreateWithFilters>b__0()
at Hangfire.Client.BackgroundJobFactory.InvokeClientFilter(IClientFilter filter, CreatingContext preContext, Func`1 continuation)
at Hangfire.Client.BackgroundJobFactory.Create(CreateContext context)
at Hangfire.BackgroundJobClient.Create(Job job, IState state)
--- End of inner exception stack trace ---
at Hangfire.BackgroundJobClient.Create(Job job, IState state)
at Hangfire.BackgroundJobClientExtensions.Schedule[T](IBackgroundJobClient client, Expression`1 methodCall, TimeSpan delay)
at Hangfire.BackgroundJob.Schedule[T](Expression`1 methodCall, TimeSpan delay)

当我使用以下命令检查 innodb 日志时:SHOW ENGINE INNODB STATUS 我得到以下日志:

=====================================
2018-12-19 14:37:29 0x2ab9c5591700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 53 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2441 srv_active, 0 srv_shutdown, 13392 srv_idle
srv_master_thread log flush and writes: 15830
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 7531
OS WAIT ARRAY INFO: signal count 8029
RW-shared spins 0, rounds 15152, OS waits 6763
RW-excl spins 0, rounds 15133, OS waits 270
RW-sx spins 58, rounds 1734, OS waits 37
Spin rounds per wait: 15152.00 RW-shared, 15133.00 RW-excl, 29.90 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-12-19 13:41:01 0x2aba11f50700
*** (1) TRANSACTION:
TRANSACTION 88410, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 443, OS thread handle 46979012679424, query id 374494 172.31.25.222 cpdbuser update
INSERT INTO `Set` (`Key`, `Value`, `Score`) VALUES (''schedule'', ''475'', 1545313257) ON DUPLICATE KEY UPDATE `Score` = 1545313257
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 147 page no 4 n bits 176 index IX_Set_Key_Value of table `cp-hangfire`.`Set` trx id 88410 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 103 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 8; hex 7363686564756c65; asc schedule;;
 1: len 3; hex 343736; asc 476;;
 2: len 4; hex 80000088; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 88408, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 457, OS thread handle 46978653554432, query id 374490 172.31.25.222 cpdbuser update
INSERT INTO `Set` (`Key`, `Value`, `Score`) VALUES (''schedule'', ''474'', 1545313257) ON DUPLICATE KEY UPDATE `Score` = 1545313257
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 147 page no 4 n bits 176 index IX_Set_Key_Value of table `cp-hangfire`.`Set` trx id 88408 lock_mode X locks gap before rec
Record lock, heap no 103 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 8; hex 7363686564756c65; asc schedule;;
 1: len 3; hex 343736; asc 476;;
 2: len 4; hex 80000088; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 147 page no 4 n bits 176 index IX_Set_Key_Value of table `cp-hangfire`.`Set` trx id 88408 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 103 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 8; hex 7363686564756c65; asc schedule;;
 1: len 3; hex 343736; asc 476;;
 2: len 4; hex 80000088; asc     ;;

*** WE ROLL BACK TRANSACTION (1)

请注意,这两个只有一个插入命令的非常简单的事务造成了死锁:

INSERT INTO `Set` (`Key`, `Value`, `Score`) VALUES (''schedule'', ''475'', 1545313257) ON DUPLICATE KEY UPDATE `Score` = 1545313257
INSERT INTO `Set` (`Key`, `Value`, `Score`) VALUES (''schedule'', ''474'', 1545313257) ON DUPLICATE KEY UPDATE `Score` = 1545313257

这里是设置表架构: 这是在 Value 和 Score 列上设置表唯一索引:

我发现this *** 回答说,即使在我觉得很奇怪的完全正常的情况下,mysql innodb 也会产生死锁。无论如何,作为一种解决方案,我尝试使用 Polly 来实现指数回退重试策略,这是一个很棒的库。但这只是推迟了错误,因为现在调度作业的代码被重试,并且在第 3 次重试后,由于 30 秒的 nginx 响应超时,客户端连接被简单地断开。

第一个问题:当这个简单的调度作业命令并发执行时,为什么 MySQL 会开始死锁?

第二个问题 如果 innodb 即使在正常情况下也确实会产生死锁,那么 MySql 将如何用于任何预期有更多并发用户的生产数据库?我错过了什么吗?

(来自评论)

CREATE TABLE `Set` (
    `Id` int(11) NOT NULL AUTO_INCREMENT, 
    `Key` varchar(100) NOT NULL, 
    `Value` varchar(256) NOT NULL, 
    `Score` double DEFAULT NULL, 
    `ExpireAt` datetime DEFAULT NULL, 
    PRIMARY KEY (`Id`), 
    UNIQUE KEY `IX_Set_Key_Value` (`Key`,`Value`)
) ENGINE=InnoDB AUTO_INCREMENT=143 DEFAULT CHARSET=latin1

【问题讨论】:

请提供SHOW CREATE TABLE。请提供整个事务的 SQL(BEGIN 到 COMMIT)。 “键值”模式通常有很多麻烦。我想你找到了另一个避开它们的理由。 这是 SHOW CREATE TABLE 'Set', 'CREATE TABLE `Set` (\n `Id` int(11) NOT NULL AUTO_INCREMENT,\n `Key` varchar(100) NOT NULL,\n `Value` varchar(256) NOT NULL,\n `Score` double DEFAULT NULL,\n `ExpireAt` datetime DEFAULT NULL,\n PRIMARY KEY (`Id`),\n UNIQUE KEY `IX_Set_Key_Value` (`Key`,`Value`)\n) ENGINE=InnoDB AUTO_INCREMENT=143 DEFAULT CHARSET=latin1'的结果 我不知道如何找到整个交易语句 但是 Hangfire 是著名的 .net 后台作业库,我认为它不应该在该库的最基本操作(即安排作业)上引发死锁。我应该能够同时安排 100 个作业而不会出现死锁失败。 【参考方案1】:

第一个问题:我不知道 Hangfire,但它不太可能只在 CoreBackgroundJobFactory.Create 中运行单个插入查询。它可能至少在另一个可以锁定自己的表上执行选择,并且这两个进程的组合可以锁定自己。

第二个问题: Innodb Locking 策略取决于transaction isolation level 如果你在运行高并发环境,你可以降低隔离级别:它会降低死锁的概率。但是,可能会出现一些ACID 的副作用,即使根据我的个人经验,即使使用 READ_UNCOMMITED,我也没有遇到任何问题。您可以尝试将其添加到 Hangfire 数据源配置中,看看会发生什么

【讨论】:

我会尝试使用READ_COMMITTED 看看会发生什么。虽然我认为我应该将此发布到hangfire 的 github 问题页面,看看他们怎么说。

以上是关于当 Hangfire 并行处理多个作业时,为啥 MySQL InnoDB 会产生如此多的死锁?的主要内容,如果未能解决你的问题,请参考以下文章

在没有数据库存储的情况下使用 HangFire 进行后台作业处理?

华为OD机试真题 Python 实现流水线调度

使用 Hangfire 创建重复作业时出错

华为OD机试真题Java实现流水线真题+解题思路+代码(2022&2023)

新解法华为OD机试 - 流水线 | 备考思路,刷题要点,答疑,od Base 提供

是什么原因导致天蓝色托管的hangfire作业具有这种行为?