数据库批量插入这么讲究的么？

Posted 2022-01-04 秦怀杂货店

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了数据库批量插入这么讲究的么？相关的知识，希望对你有一定的参考价值。

最近新的项目写了不少各种 `insertBatch` 的代码，一直有人说，批量插入比循环插入效率高很多，那本文就来实验一下,到底是不是真的？

测试环境：

- SpringBoot 2.5

- mysql 8

- JDK 8

- Docker

首先，多条数据的插入，可选的方案：

- `foreach`循环插入

- 拼接`sql`，一次执行

- 使用批处理功能插入

## 搭建测试环境`

`sql`文件：

```sql

drop database IF EXISTS test;

CREATE DATABASE test;

use test;

DROP TABLE IF EXISTS `user`;

CREATE TABLE `user` (

`id` int(11) NOT NULL,

`name` varchar(255) DEFAULT "",

`age` int(11) DEFAULT 0,

PRIMARY KEY (`id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

```

应用的配置文件：

```properties

server:

port: 8081

spring:

#数据库连接配置

datasource:

driver-class-name: com.mysql.cj.jdbc.Driver

url: jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding=utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true

username: root

password: 123456

#mybatis的相关配置

mybatis:

#mapper配置文件

mapper-locations: classpath:mapper/*.xml

type-aliases-package: com.aphysia.spingbootdemo.model

#开启驼峰命名

configuration:

map-underscore-to-camel-case: true

logging:

level:

root: error

```

启动文件，配置了`Mapper`文件扫描的路径：

```java

import org.mybatis.spring.annotation.MapperScan;

import org.springframework.boot.SpringApplication;

import org.springframework.boot.autoconfigure.EnableAutoConfiguration;

import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication

@MapperScan("com.aphysia.springdemo.mapper")

public class SpringdemoApplication

public static void main(String[] args)

SpringApplication.run(SpringdemoApplication.class, args);

```

`Mapper`文件一共准备了几个方法，插入单个对象，删除所有对象，拼接插入多个对象：

```java

import com.aphysia.springdemo.model.User;

import org.apache.ibatis.annotations.Param;

import java.util.List;

public interface UserMapper

int insertUser(User user);

int deleteAllUsers();

int insertBatch(@Param("users") List<User>users);

```

`Mapper.xml`文件如下：

```xml

<?xml version="1.0" encoding="UTF-8" ?>

<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd" >

insert into user(id,age) values(#id,#age)

</insert>

delete from user where id>0;

</delete>

insert into user(id,age) VALUES

(#model.id, #model.age)

</foreach>

</insert>

</mapper>

```

测试的时候，每次操作我们都删除掉所有的数据，保证测试的客观，不受之前的数据影响。

## 不同的测试

### 1. foreach 插入

先获取列表，然后每一条数据都执行一次数据库操作，插入数据：

```java

@SpringBootTest

@MapperScan("com.aphysia.springdemo.mapper")

class SpringdemoApplicationTests

@Autowired

SqlSessionFactory sqlSessionFactory;

@Resource

UserMapper userMapper;

static int num = 100000;

static int id = 1;

@Test

void insertForEachTest()

List<User> users = getRandomUsers();

long start = System.currentTimeMillis();

for (int i = 0; i < users.size(); i++)

userMapper.insertUser(users.get(i));

long end = System.currentTimeMillis();

System.out.println("time:" + (end - start));

```

### 2. 拼接sql插入

其实就是用以下的方式插入数据：

```sql

INSERT INTO `user` (`id`, `age`)

VALUES (1, 11),

(2, 12),

(3, 13),

(4, 14),

(5, 15);

```

```java

@Test

void insertSplicingTest()

List<User> users = getRandomUsers();

long start = System.currentTimeMillis();

userMapper.insertBatch(users);

long end = System.currentTimeMillis();

System.out.println("time:" + (end - start));

```

### 3. 使用Batch批量插入

将`MyBatis session` 的 `executor type` 设为 `Batch `，使用`sqlSessionFactory`将执行方式置为批量，自动提交置为`false`,全部插入之后，再一次性提交：

```java

@Test

public void insertBatch()

SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);

UserMapper mapper = sqlSession.getMapper(UserMapper.class);

List<User> users = getRandomUsers();

long start = System.currentTimeMillis();

for(int i=0;i<users.size();i++)

mapper.insertUser(users.get(i));

sqlSession.commit();

sqlSession.close();

long end = System.currentTimeMillis();

System.out.println("time:" + (end - start));

```

### 4. 批量处理+分批提交

在批处理的基础上，每1000条数据，先提交一下，也就是分批提交。

```java

@Test

public void insertBatchForEachTest()

SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);

UserMapper mapper = sqlSession.getMapper(UserMapper.class);

List<User> users = getRandomUsers();

long start = System.currentTimeMillis();

for(int i=0;i<users.size();i++)

mapper.insertUser(users.get(i));

if (i % 1000 == 0 || i == num - 1)

sqlSession.commit();

sqlSession.clearCache();

sqlSession.close();

long end = System.currentTimeMillis();

System.out.println("time:" + (end - start));

```

## 初次结果，明显不对？

运行上面的代码，我们可以得到下面的结果，`for`循环插入的效率确实很差，拼接的`sql`效率相对高一点，看到有些资料说拼接`sql`可能会被`mysql`限制，但是我执行到`1000w`的时候，才看到堆内存溢出。

**下面是不正确的结果！！！**

| 插入方式 | 10 | 100 | 1000 | 1w | 10w | 100w | 1000w |

| :---------------: | :--: | :--: | :--: | :---: | :----: | :-------: | ---------------------------- |

| for循环插入 | 387 | 1150 | 7907 | 70026 | 635984 | 太久了... | 太久了... |

| 拼接sql插入 | 308 | 320 | 392 | 838 | 3156 | 24948 | OutOfMemoryError: 堆内存溢出 |

| 批处理 | 392 | 917 | 5442 | 51647 | 470666 | 太久了... | 太久了... |

| 批处理 + 分批提交 | 359 | 893 | 5275 | 50270 | 472462 | 太久了... | 太久了... |

### 拼接sql并没有超过内存

我们看一下`mysql`的限制:

```txt

mysql> show VARIABLES like %max_allowed_packet%;

+---------------------------+------------+

| Variable_name | Value |

+---------------------------+------------+

| max_allowed_packet | 67108864 |

| mysqlx_max_allowed_packet | 67108864 |

| slave_max_allowed_packet | 1073741824 |

+---------------------------+------------+

3 rows in set (0.12 sec)

```

这`67108864`足足`600`多M，太大了，怪不得不会报错，那我们去改改一下它吧，改完重新测试：

1. 首先在启动`mysql`的情况下，进入容器内，也可以直接在`Docker`桌面版直接点`Cli`图标进入：

```shell

docker exec -it mysql bash

```

2. 进入` /etc/mysql `目录，去修改`my.cnf`文件：

```shell

cd /etc/mysql

```

3. 先按照`vim`，要不编辑不了文件：

```shell

apt-get update

apt-get install vim

```

4. 修改`my.cnf`

```shell

vim my.cnf

```

5. 在最后一行添加`max_allowed_packet=20M`(按`i`编辑，编辑完按`esc`，输入`:wq`退出)

```shell

[mysqld]

pid-file = /var/run/mysqld/mysqld.pid

socket = /var/run/mysqld/mysqld.sock

datadir = /var/lib/mysql

secure-file-priv= NULL

# Disabling symbolic-links is recommended to prevent assorted security risks

symbolic-links=0

# Custom config should go here

!includedir /etc/mysql/conf.d/

max_allowed_packet=2M

```

6. 退出容器

```shell

# exit

```

7. 查看`mysql`容器`id`

```shell

docker ps -a

```

![image-20211130005909539](https://markdownpicture.oss-cn-qingdao.aliyuncs.com/blog/image-20211130005909539.png)

8. 重启`mysql`

```shell

docker restart c178e8998e68

```

重启成功后查看最大的`max_allowed_pactet`，发现已经修改成功：

```shell

mysql> show VARIABLES like %max_allowed_packet%;

+---------------------------+------------+

| Variable_name | Value |

+---------------------------+------------+

| max_allowed_packet | 2097152 |

| mysqlx_max_allowed_packet | 67108864 |

| slave_max_allowed_packet | 1073741824 |

+---------------------------+------------+

```

我们再次执行拼接`sql`，发现`100w`的时候，`sql`就达到了`3.6M`左右，超过了我们设置的`2M`，成功的演示抛出了错误：

```shell

org.springframework.dao.TransientDataAccessResourceException:

### Cause: com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the max_allowed_packet variable.

; Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the max_allowed_packet variable.; nested exception is com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the max_allowed_packet variable.

```

### 批量处理为什么这么慢？

但是，仔细一看就会发现，上面的方式，怎么批处理的时候，并没有展示出优势了，和`for`循环没有什么区别？这是对的么？

这肯定是不对的，从官方文档中，我们可以看到它会批量更新，不会每次去创建预处理语句，理论是更快的。

![image-20211130011820487](https://markdownpicture.oss-cn-qingdao.aliyuncs.com/blog/image-20211130011820487.png)

然后我发现我的一个最重要的问题：数据库连接 `URL `地址少了**rewriteBatchedStatements=true**

如果我们不写，`MySQL JDBC` 驱动在默认情况下会忽视 `executeBatch()` 语句，我们期望批量执行的一组 `sql` 语句拆散，但是执行的时候是一条一条地发给 `MySQL` 数据库，实际上是单条插入，直接造成较低的性能。我说怎么性能和循环去插入数据差不多。

只有将 `rewriteBatchedStatements` 参数置为 `true`, 数据库驱动才会帮我们批量执行 `SQL`。

正确的数据库连接：

```shell

jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding=utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true&&&rewriteBatchedStatements=true

```

找到问题之后，我们重新测试批量测试,最终的结果如下：

| 插入方式 | 10 | 100 | 1000 | 1w | 10w | 100w | 1000w |

| :---------------: | :--: | :--: | :--: | :---: | :----: | :----------------------------: | ---------------------------- |

| for循环插入 | 387 | 1150 | 7907 | 70026 | 635984 | 太久了... | 太久了... |

| 拼接sql插入 | 308 | 320 | 392 | 838 | 3156 | 24948（很可能超过sql长度限制） | OutOfMemoryError: 堆内存溢出 |

| 批处理(重点) | 333 | 323 | 362 | 636 | 1638 | 8978 | OutOfMemoryError: 堆内存溢出 |

| 批处理 + 分批提交 | 359 | 313 | 394 | 630 | 2907 | 18631 | OutOfMemoryError: 堆内存溢出 |

从上面的结果来看，确实批处理是要快很多的，当数量级太大的时候，其实都会超过内存溢出的，批处理加上分批提交并没有变快，和批处理差不多，反而变慢了，提交太多次了，拼接`sql`的方案在数量比较少的时候其实和批处理相差不大，最差的方案就是`for`循环插入数据，这真的特别的耗时。`100`条的时候就已经需要`1s`了，不能选择这种方案。

一开始发现批处理比较慢的时候，真的挺怀疑自己，后面发现是有一个参数，有一种拨开云雾的感觉，知道得越多，不知道的越多。

**【作者简介】**：

秦怀，公众号【**秦怀杂货店**】作者，技术之路不在一时，山高水长，纵使缓慢，驰而不息。

[剑指Offer全部题解PDF](http://aphysia.cn/archives/jianzhiofferpdf)

[2020年我写了什么？](http://aphysia.cn/archives/2020)

[开源编程笔记](https://damaer.github.io/Coding/#/)最近新的项目写了不少各种 insertBatch 的代码，一直有人说，批量插入比循环插入效率高很多，那本文就来实验一下,到底是不是真的？测试环境：

SpringBoot 2.5
Mysql 8
JDK 8
Docker

首先，多条数据的插入，可选的方案：

foreach循环插入
拼接sql，一次执行
使用批处理功能插入

搭建测试环境`

sql文件：

drop database IF EXISTS test;
CREATE DATABASE test;
use test;
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT "",
  `age` int(11) DEFAULT 0,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

应用的配置文件：

server:
  port: 8081
spring:
  #数据库连接配置
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf-8&useSSL=false&allowPub

   
 (c)2006-2024 SYSTEM All Rights Reserved  IT常识