这次开发支付对帐时,持久化对帐数据时线上突然出现下面这个死锁.
### Error updating database. Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
### The error may involve com.imxiaomai.pay.service.reconcile.PayReconcileDao.insert-Inline
### The error occurred while setting parameters
### SQL: insert into pay_reconcile ( trade_time, shop_no, out_trade_no, out_order_no, total_fee, cost, system_id, payment_category, create_time, direction, refund_original_order_no ) select ?, ?, ?, ?, ?, ?, ?, ?, now(), ?, ? FROM dual WHERE NOT EXISTS ( SELECT 1 FROM pay_reconcile where out_trade_no=? AND `out_order_no`=? )
### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
; SQL []; Deadlock found when trying to get lock; try restarting transaction; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction[com.imxiaomai.pay.service.reconcile.GenericReconcileHandler:run]
org.springframework.dao.DeadlockLoserDataAccessException:
### Error updating database. Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
经过初步分析,这个SQL是单条数据插入,mysql是行级锁,不应该发生死锁.然后找DBA查看数据库死锁,找到如下死锁语句.
018-01-18 11:03:15 2adb0dc40700TOO DEEP OR LONG SEARCH IN THE LOCK TABLE WAITS-FOR GRAPH, WE WILL ROLL BACK FOLLOWING TRANSACTION
*** TRANSACTION:
TRANSACTION 121106127, ACTIVE 0.041 sec setting auto-inc lock
mysql tables in use 2, locked 1
1 lock struct(s), heap size 360, 0 row lock(s)
MySQL thread id 1443086444, OS thread handle 0x2adb0dc40700, query id 195863600 10.171.134.40 bldhq executing
insert into pay_reconcile
( trade_time,
shop_no,
out_trade_no,
out_order_no,
total_fee,
cost,
system_id,
payment_category,
create_time,
direction,
refund_original_order_no )
select ‘2018-01-17 16:20:13.0‘,
‘XM0003‘,
‘4200000069201801175791162100‘,
‘118301180117162014‘,
2120,
13,
1,
1,
now(),
1,
null
FROM dual
WHERE NOT EXISTS (
SELECT 1 FROM pay_reconcile where out_trade_no=‘4200000069201801175791162100‘
AND `out_order_no`=‘118301180117162014‘
)
*** WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `bldshop`.`pay_reconcile` trx id 121106127 lock mode AUTO-INC waiting
*** WE ROLL BACK TRANSACTION (2)
产生原因:采用线程池(10个)插入数据时,自增列排队过长导致事务排队过长.而MYSQL自增列默认锁表.
【1】innodb_autoinc_lock_mode 的说明
innodb_auto_lockmode有三个取值:
1、0 这个表示tradition 传统
2、1 这个表示consecutive 连续
3、2 这个表示interleaved 交错
【1.1】tradition(innodb_autoinc_lock_mode=0) 模式:
1、它提供了一个向后兼容的能力
2、在这一模式下,所有的insert语句("insert like") 都要在语句开始的时候得到一个
表级的auto_inc锁,在语句结束的时候才释放这把锁,注意呀,这里说的是语句级而不是事务级的,
一个事务可能包涵有一个或多个语句。
3、它能保证值分配的可预见性,与连续性,可重复性,这个也就保证了insert语句在复制到slave
的时候还能生成和master那边一样的值(它保证了基于语句复制的安全)。
4、由于在这种模式下auto_inc锁一直要保持到语句的结束,所以这个就影响到了并发的插入。
【1.2】consecutive(innodb_autoinc_lock_mode=1) 模式:
1、这一模式下去simple insert 做了优化,由于simple insert一次性插入值的个数可以立马得到
确定,所以mysql可以一次生成几个连续的值,用于这个insert语句;总的来说这个对复制也是安全的
(它保证了基于语句复制的安全)
2、这一模式也是mysql的默认模式,这个模式的好处是auto_inc锁不要一直保持到语句的结束,只要
语句得到了相应的值后就可以提前释放锁
【1.3】interleaved(innodb_autoinc_lock_mode=2) 模式
1、由于这个模式下已经没有了auto_inc锁,所以这个模式下的性能是最好的;但是它也有一个问题,就是
对于同一个语句来说它所得到的auto_incremant值可能不是连续的。
【2】如果你的二进制文件格式是mixed | row 那么这三个值中的任何一个对于你来说都是复制安全的。
由于现在mysql已经推荐把二进制的格式设置成row,所以在binlog_format不是statement的情况下最
好是innodb_autoinc_lock_mode=2 这样可能知道更好的性能。
解决方法:
1.服务程序生成唯一ID,不使用数据库自增ID.(采用twitter-的snowflake算法.源于UUID变种)
2.调整线程池大小,由10个变成2个.减少数据库压力..
3.待线上验证....(线上验证一周,未发现死锁.)
4.部署时需要配置启动参数... -Dinstance.group 与 -Dinstance.id参数,且这两个值的取值范围为0-31.
1 public class Snowflake implements SerialNumber { 2 // ==============================Fields=========================================== 3 /** 开始时间截 (2015-01-01) */ 4 private final long twepoch = 1420041600000L; 5 6 /** 机器id所占的位数 */ 7 private final long workerIdBits = 5L; 8 9 /** 数据标识id所占的位数 */ 10 private final long datacenterIdBits = 5L; 11 12 /** 支持的最大机器id,结果是31 (这个移位算法可以很快的计算出几位二进制数所能表示的最大十进制数) */ 13 private final long maxWorkerId = -1L ^ (-1L << workerIdBits); 14 15 /** 支持的最大数据标识id,结果是31 */ 16 private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits); 17 18 /** 序列在id中占的位数 */ 19 private final long sequenceBits = 12L; 20 21 /** 机器ID向左移12位 */ 22 private final long workerIdShift = sequenceBits; 23 24 /** 数据标识id向左移17位(12+5) */ 25 private final long datacenterIdShift = sequenceBits + workerIdBits; 26 27 /** 时间截向左移22位(5+5+12) */ 28 private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits; 29 30 /** 生成序列的掩码,这里为4095 (0b111111111111=0xfff=4095) */ 31 private final long sequenceMask = -1L ^ (-1L << sequenceBits); 32 33 /** 工作机器ID(0~31) */ 34 private long workerId; 35 36 /** 数据中心ID(0~31) */ 37 private long datacenterId; 38 39 /** 毫秒内序列(0~4095) */ 40 private long sequence = 0L; 41 42 /** 上次生成ID的时间截 */ 43 private long lastTimestamp = -1L; 44 45 private static Snowflake singleInstance; 46 private static final String INSTANCE_GROUP="instance.group"; 47 48 private static final String INSTANCE_ID="instance.id"; 49 public static Snowflake create(){ 50 if(null==singleInstance){ 51 synchronized (Snowflake.class){ 52 if(null==singleInstance){ 53 String group=System.getProperty(INSTANCE_GROUP); 54 String id=System.getProperty(INSTANCE_ID); 55 if(StringUtils.isBlank(group)){ 56 throw new RuntimeException("instance.group must be exist and great -1 and less 32"); 57 } 58 if(StringUtils.isBlank(id)){ 59 throw new RuntimeException("instance.id must be exist and great -1 and less 32"); 60 } 61 if(!NumberUtils.isDigits(group)){ 62 throw new RuntimeException("instance.group must be exist and great -1 and less 32"); 63 } 64 if(!NumberUtils.isDigits(id)){ 65 throw new RuntimeException("instance.id must be exist and great -1 and less 32"); 66 } 67 int g=Integer.valueOf(group); 68 int i=Integer.valueOf(id); 69 if(g<0||g>=32){ 70 throw new RuntimeException("instance.group must be exist and great -1 and less 32"); 71 } 72 if(i<0||i>=32){ 73 throw new RuntimeException("instance.id must be exist and great -1 and less 32"); 74 } 75 singleInstance=new Snowflake(i,g); 76 } 77 } 78 } 79 return singleInstance; 80 } 81 //==============================Constructors===================================== 82 /** 83 * 构造函数 84 * @param workerId 工作ID (0~31) 85 * @param datacenterId 数据中心ID (0~31) 86 */ 87 private Snowflake(long workerId, long datacenterId) { 88 89 90 if (workerId > maxWorkerId || workerId < 0) { 91 throw new IllegalArgumentException(String.format("worker Id can‘t be greater than %d or less than 0", maxWorkerId)); 92 } 93 if (datacenterId > maxDatacenterId || datacenterId < 0) { 94 throw new IllegalArgumentException(String.format("datacenter Id can‘t be greater than %d or less than 0", maxDatacenterId)); 95 } 96 this.workerId = workerId; 97 this.datacenterId = datacenterId; 98 } 99 100 // ==============================Methods========================================== 101 /** 102 * 获得下一个ID (该方法是线程安全的) 103 * @return SnowflakeId 104 */ 105 @Override 106 public synchronized long nextId() { 107 long timestamp = timeGen(); 108 109 //如果当前时间小于上一次ID生成的时间戳,说明系统时钟回退过这个时候应当抛出异常 110 if (timestamp < lastTimestamp) { 111 throw new RuntimeException( 112 String.format("Clock moved backwards. Refusing to generate id for %d milliseconds", lastTimestamp - timestamp)); 113 } 114 115 //如果是同一时间生成的,则进行毫秒内序列 116 if (lastTimestamp == timestamp) { 117 sequence = (sequence + 1) & sequenceMask; 118 //毫秒内序列溢出 119 if (sequence == 0) { 120 //阻塞到下一个毫秒,获得新的时间戳 121 timestamp = tilNextMillis(lastTimestamp); 122 } 123 } 124 //时间戳改变,毫秒内序列重置 125 else { 126 sequence = 0L; 127 } 128 129 //上次生成ID的时间截 130 lastTimestamp = timestamp; 131 132 //移位并通过或运算拼到一起组成64位的ID 133 return ((timestamp - twepoch) << timestampLeftShift) // 134 | (datacenterId << datacenterIdShift) // 135 | (workerId << workerIdShift) // 136 | sequence; 137 } 138 139 /** 140 * 阻塞到下一个毫秒,直到获得新的时间戳 141 * @param lastTimestamp 上次生成ID的时间截 142 * @return 当前时间戳 143 */ 144 protected long tilNextMillis(long lastTimestamp) { 145 long timestamp = timeGen(); 146 while (timestamp <= lastTimestamp) { 147 timestamp = timeGen(); 148 } 149 return timestamp; 150 } 151 152 /** 153 * 返回以毫秒为单位的当前时间 154 * @return 当前时间(毫秒) 155 */ 156 protected long timeGen() { 157 return System.currentTimeMillis(); 158 } 159 160 public static void main(String[] args) { 161 Snowflake snowflake=new Snowflake(1,1); 162 ExecutorService executorService= Executors.newFixedThreadPool(50); 163 Map<Long,Integer> map=new ConcurrentHashMap<> (1000000); 164 Integer tag=Integer.valueOf(0); 165 for (long i=0;i<1000000;++i) { 166 executorService.execute(new Runnable() { 167 @Override 168 public void run() { 169 Long val=Long.valueOf(snowflake.nextId()); 170 map.put(val,tag); 171 System.out.println(val); 172 } 173 }); 174 } 175 System.out.println(map.size()); 176 try { 177 System.in.read(); 178 while (1000000!=map.size()){ 179 System.out.println(map.size()); 180 } 181 System.out.println(map.size()); 182 executorService.shutdown(); 183 System.in.read(); 184 185 }catch (Throwable throwable){ 186 System.out.println(throwable.getMessage()); 187 } 188 } 189 }