mysql 查询 - 不能很好地扩展,15k 记录很慢
Posted
技术标签:
【中文标题】mysql 查询 - 不能很好地扩展,15k 记录很慢【英文标题】:mysql query - not scaling well, slow with 15k records 【发布时间】:2016-07-07 23:45:40 【问题描述】:我有一个大型 SQL 查询 (mysql),用于填充 CRM 系统中的记录表。
这在大约 4000 条记录中运行良好且速度非常快。现在达到 15,500,它的运行速度非常缓慢。返回数据大约需要 70 秒。
我曾尝试添加一些索引,但收效甚微。有什么建议吗?
查询是:
SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS
s.*,
ca2.address_postcode,
ca2.resident_status
FROM
(
SELECT
a.id,
c.name_first,
c.name_last,
a.created as app_created,
a.edited,
a.comments as custcomment,
c.name_company,
a.loan_amount,
a.product_type,
ap.packager as placed_with,
a.introducer_id,
a.status,
c.contact_number,
c.email,
a.database,
bd.legal_structure,
(
SELECT
ca1.id
FROM
cl_customer_address AS ca1
WHERE
ca1.customer_id = a.customer_id AND
ca1.deleted = "0000-00-00 00:00:00"
ORDER BY
ca1.created DESC
LIMIT
1
) AS address_id
FROM
cl_application AS a
LEFT JOIN
cl_application_business_details as bd on bd.id = a.id
LEFT JOIN
cl_customer AS c ON c.id = a.customer_id AND c.deleted = c.deleted
LEFT JOIN
cl_application_packager AS ap ON ap.application_id = a.id AND ap.status != "Declined" AND ap.deleted = "0000-00-00 00:00:00"
WHERE
(a.deleted = "0000-00-00 00:00:00")
GROUP BY
a.id
) AS s
LEFT JOIN
cl_customer_address AS ca2 ON ca2.id = s.address_id AND ca2.deleted = ca2.deleted
WHERE
(ca2.deleted = ca2.deleted OR ca2.deleted IS NULL)
ORDER BY
app_created DESC
LIMIT
0, 100;
Time: 70.382
Rows: 100
表格说明如下:
CREATE TABLE cl_application (
id int(11) NOT NULL AUTO_INCREMENT,
customer_id int(11) NOT NULL,
product_type tinytext COLLATE utf8_unicode_ci,
introducer_id int(11) NOT NULL,
loan_amount double NOT NULL,
loan_purpose tinytext COLLATE utf8_unicode_ci NOT NULL,
comments text COLLATE utf8_unicode_ci NOT NULL,
security_value double NOT NULL,
property_address_1 tinytext COLLATE utf8_unicode_ci NOT NULL,
property_address_2 tinytext COLLATE utf8_unicode_ci NOT NULL,
property_town_city tinytext COLLATE utf8_unicode_ci NOT NULL,
property_postcode varchar(8) COLLATE utf8_unicode_ci NOT NULL,
property_country tinytext COLLATE utf8_unicode_ci NOT NULL,
application_source tinytext COLLATE utf8_unicode_ci NOT NULL,
`status` enum('WK - Working Lead','APP - Application Taken','ISS - Pack Issued','HOT - Head Of Terms Sent','APU - Application underway','OFI - Offer Issued','DIP - Deal in Progress','DUS - Declined Unsecured/Trying Secured','DUG - Declined Unsecured/Trying Guarantor','COM - Completed Awaiting Payment','PAC - Paid and Completed','TD - Turned Down','DUP - Duplicate application') COLLATE utf8_unicode_ci NOT NULL DEFAULT 'WK - Working Lead',
multi_stage_status text COLLATE utf8_unicode_ci NOT NULL,
owner_id int(11) NOT NULL,
token tinytext COLLATE utf8_unicode_ci NOT NULL,
cache_keyword text COLLATE utf8_unicode_ci NOT NULL,
old_id int(11) NOT NULL,
placed_with tinytext COLLATE utf8_unicode_ci NOT NULL,
submitted datetime NOT NULL,
`database` enum('LV','TD','CD','UL') COLLATE utf8_unicode_ci NOT NULL DEFAULT 'LV',
snoozed datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
created datetime NOT NULL,
edited datetime NOT NULL,
deleted datetime NOT NULL,
PRIMARY KEY (id),
FULLTEXT KEY cache_keyword (cache_keyword)
) ;
CREATE TABLE cl_application_packager (
id int(11) NOT NULL AUTO_INCREMENT,
application_id int(11) NOT NULL,
packager tinytext COLLATE utf8_unicode_ci,
packager_id int(11) DEFAULT NULL,
amount double DEFAULT NULL,
`status` enum('In Progress','Declined','Accepted','Completed') COLLATE utf8_unicode_ci DEFAULT 'In Progress',
commision double DEFAULT NULL,
created datetime NOT NULL,
edited datetime NOT NULL,
deleted datetime NOT NULL,
PRIMARY KEY (id),
KEY application_id (deleted)
) ;
CREATE TABLE cl_application_business_details (
id int(11) NOT NULL AUTO_INCREMENT,
company_number tinytext COLLATE utf8_unicode_ci,
legal_structure enum('Ltd','LLP','Partnership','Sole Trader') COLLATE utf8_unicode_ci DEFAULT NULL,
incorporated date DEFAULT NULL,
created datetime NOT NULL,
edited datetime NOT NULL,
deleted datetime NOT NULL,
PRIMARY KEY (id)
) ;
CREATE TABLE cl_customer (
id int(11) NOT NULL AUTO_INCREMENT,
title enum('Mr','Mrs','Ms','Miss') COLLATE utf8_unicode_ci NOT NULL,
name_first tinytext COLLATE utf8_unicode_ci NOT NULL,
name_last tinytext COLLATE utf8_unicode_ci NOT NULL,
dob date NOT NULL,
marital_status tinytext COLLATE utf8_unicode_ci NOT NULL,
name_company tinytext COLLATE utf8_unicode_ci NOT NULL,
email tinytext COLLATE utf8_unicode_ci NOT NULL,
alt_email tinytext COLLATE utf8_unicode_ci,
contact_number tinytext COLLATE utf8_unicode_ci NOT NULL,
home_phone_number tinytext COLLATE utf8_unicode_ci NOT NULL,
work_phone_number tinytext COLLATE utf8_unicode_ci NOT NULL,
created datetime NOT NULL,
edited datetime NOT NULL,
deleted datetime NOT NULL,
PRIMARY KEY (id)
) ;
CREATE TABLE cl_customer_address (
id int(11) NOT NULL AUTO_INCREMENT,
customer_id int(11) NOT NULL,
application_id int(11) NOT NULL,
house_name tinytext COLLATE utf8_unicode_ci NOT NULL,
house_number tinytext COLLATE utf8_unicode_ci NOT NULL,
address_line_1 tinytext COLLATE utf8_unicode_ci NOT NULL,
address_line_2 tinytext COLLATE utf8_unicode_ci NOT NULL,
address_town tinytext COLLATE utf8_unicode_ci NOT NULL,
address_postcode tinytext COLLATE utf8_unicode_ci NOT NULL,
moved_in date NOT NULL,
vacated date NOT NULL DEFAULT '0000-00-00',
ptcabs tinytext COLLATE utf8_unicode_ci NOT NULL,
resident_status tinytext COLLATE utf8_unicode_ci NOT NULL,
created datetime NOT NULL,
edited datetime NOT NULL,
deleted datetime NOT NULL,
PRIMARY KEY (id),
KEY customer_id (customer_id,deleted)
) ;
mysql> describe cl_application
-> ;
+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| customer_id | int(11) | NO | | NULL | |
| product_type | tinytext | YES | | NULL | |
| introducer_id | int(11) | NO | | NULL | |
| loan_amount | double | NO | | NULL | |
| loan_purpose | tinytext | NO | | NULL | |
| comments | text | NO | | NULL | |
| security_value | double | NO | | NULL | |
| property_address_1 | tinytext | NO | | NULL | |
| property_address_2 | tinytext | NO | | NULL | |
| property_town_city | tinytext | NO | | NULL | |
| property_postcode | varchar(8) | NO | | NULL | |
| property_country | tinytext | NO | | NULL | |
| application_source | tinytext | NO | | NULL | |
| status | enum('WK - Working Lead','APP - Application Taken','ISS - Pack Issued','HOT - Head Of Terms Sent','APU - Application underway','OFI - Offer Issued','DIP - Deal in Progress','DUS - Declined Unsecured/Trying Secured','DUG - Declined Unsecured/Trying Guarantor','COM - Completed Awaiting Payment','PAC - Paid and Completed','TD - Turned Down','DUP - Duplicate application') | NO | | WK - Working Lead | |
| multi_stage_status | text | NO | | NULL | |
| owner_id | int(11) | NO | | NULL | |
| token | tinytext | NO | | NULL | |
| cache_keyword | text | NO | MUL | NULL | |
| old_id | int(11) | NO | | NULL | |
| placed_with | tinytext | NO | | NULL | |
| submitted | datetime | NO | | NULL | |
| database | enum('LV','TD','CD','UL') | NO | | LV | |
| snoozed | datetime | NO | | 0000-00-00 00:00:00 | |
| created | datetime | NO | | NULL | |
| edited | datetime | NO | | NULL | |
| deleted | datetime | NO | | NULL | |
+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----+---------------------+----------------+
27 rows in set (0.00 sec)
--
mysql> describe cl_application_business_details;
+-----------------+-----------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-----------------------------------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| company_number | tinytext | YES | | NULL | |
| legal_structure | enum('Ltd','LLP','Partnership','Sole Trader') | YES | | NULL | |
| incorporated | date | YES | | NULL | |
| created | datetime | NO | | NULL | |
| edited | datetime | NO | | NULL | |
| deleted | datetime | NO | | NULL | |
+-----------------+-----------------------------------------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
--
mysql> describe cl_customer;
+-------------------+------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+------------------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | enum('Mr','Mrs','Ms','Miss') | NO | | NULL | |
| name_first | tinytext | NO | | NULL | |
| name_last | tinytext | NO | | NULL | |
| dob | date | NO | | NULL | |
| marital_status | tinytext | NO | | NULL | |
| name_company | tinytext | NO | | NULL | |
| email | tinytext | NO | | NULL | |
| alt_email | tinytext | YES | | NULL | |
| contact_number | tinytext | NO | | NULL | |
| home_phone_number | tinytext | NO | | NULL | |
| work_phone_number | tinytext | NO | | NULL | |
| created | datetime | NO | | NULL | |
| edited | datetime | NO | | NULL | |
| deleted | datetime | NO | | NULL | |
+-------------------+------------------------------+------+-----+---------+----------------+
15 rows in set (0.00 sec)
--
mysql> describe cl_application_packager;
+----------------+-------------------------------------------------------+------+-----+-------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------------------------------------------------+------+-----+-------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| application_id | int(11) | NO | | NULL | |
| packager | tinytext | YES | | NULL | |
| packager_id | int(11) | YES | | NULL | |
| amount | double | YES | | NULL | |
| status | enum('In Progress','Declined','Accepted','Completed') | YES | | In Progress | |
| commision | double | YES | | NULL | |
| created | datetime | NO | | NULL | |
| edited | datetime | NO | | NULL | |
| deleted | datetime | NO | MUL | NULL | |
+----------------+-------------------------------------------------------+------+-----+-------------+----------------+
10 rows in set (0.00 sec)
mysql> describe cl_customer_address;
+------------------+----------+------+-----+------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+----------+------+-----+------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| customer_id | int(11) | NO | MUL | NULL | |
| application_id | int(11) | NO | | NULL | |
| house_name | tinytext | NO | | NULL | |
| house_number | tinytext | NO | | NULL | |
| address_line_1 | tinytext | NO | | NULL | |
| address_line_2 | tinytext | NO | | NULL | |
| address_town | tinytext | NO | | NULL | |
| address_postcode | tinytext | NO | | NULL | |
| moved_in | date | NO | | NULL | |
| vacated | date | NO | | 0000-00-00 | |
| ptcabs | tinytext | NO | | NULL | |
| resident_status | tinytext | NO | | NULL | |
| created | datetime | NO | | NULL | |
| edited | datetime | NO | | NULL | |
| deleted | datetime | NO | | NULL | |
+------------------+----------+------+-----+------------+----------------+
16 rows in set (0.00 sec)
解释输出:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL 178913936 Using filesort
1 PRIMARY ca2 eq_ref PRIMARY PRIMARY 4 s.address_id 1 Using where
2 DERIVED a ALL PRIMARY, cache_keyword 14494 Using where; Using temporary; Using filesort
2 DERIVED bd eq_ref PRIMARY PRIMARY 4 s-choiceloans.a.id 1
2 DERIVED c eq_ref PRIMARY PRIMARY 4 s-choiceloans.a.customer_id 1 Using where
2 DERIVED ap ALL 12344 Using where; Using join buffer (Block Nested Loop)
3 DEPENDENT SUBQUERY ca1 ref customer_id customer_id 9 func,const 1 Using where; Using filesort
根据答案的跟进更改查询后:
https://jsbin.com/bonarucisi/2/edit?html,output
【问题讨论】:
创建表语句、数据和解释输出最适合未来的问题。或者小提琴+解释 好点现在添加了创建表语句。 永远不要在SELECT
语句中使用*
。
@Hafenkranich:在这种情况下,s.*
在语句本身中引用了一个内联视图。只要内联视图只返回 OP 需要返回的表达式,在这种情况下使用 s.* 就没有任何恶意。
我会避免子选择并考虑将其重构为另一个连接表。子选择的效率不是很高。
【参考方案1】:
最大的嫌疑是内联视图(或派生表,在 MySQL venacular 中)。MySQL 将物化派生表,然后外部查询将针对它运行。
另一个嫌疑人是 SELECT 列表中的相关子查询。这将对外部查询返回的每一行执行。
另外,ORDER BY app_created
将需要 Using filesort
操作(除非您运行可能在派生表上构建索引的更新版本的 MySQL。)
有一些奇怪的谓词,例如c.deleted = c.deleted
对于在deleted
列中具有非 NULL 值的每一行,这都是正确的。这相当于c.deleted IS NOT NULL
。
为此:ca2.deleted = ca2.deleted OR ca2.deleted IS NULL
,这永远是真的。
然而,这些都不会对性能产生太大影响。
运行 EXPLAIN 查看执行计划。
如果没有,这里是一些建议的第一个切入点:
在c1_application_packager
上添加覆盖索引
CREATE INDEX c1_application_packager_IX2 ON c1_application_packager
(application_id, deleted, status, packager)
;
在cl_customer_address
上添加覆盖索引
CREATE INDEX c1_customer_address_IX2 ON c1_customer_address
(customer_id, deleted, created, id, address_postcode, resident_status)
;
并重新编写查询以消除派生表。用您用来从 c1_customer_address 返回 id
的相同相关子查询替换与 ca2 的联接...
SELECT a.id
, c.name_first
, c.name_last
, a.created AS app_created
, a.edited
, a.comments AS custcomment
, c.name_company
, a.loan_amount
, a.product_type
, ap.packager AS placed_with
, a.introducer_id
, a.status
, c.contact_number
, c.email
, a.database
, bd.legal_structure
, ( SELECT ca1.id
FROM cl_customer_address ca1
WHERE ca1.customer_id = a.customer_id
AND ca1.deleted = '0000-00-00 00:00:00'
ORDER BY ca1.customer_id, ca1.deleted, ca1.created DESC
LIMIT 1
) AS address_id
, ( SELECT ca2.address_postcode
FROM cl_customer_address ca2
WHERE ca2.customer_id = a.customer_id
AND ca2.deleted = '0000-00-00 00:00:00'
ORDER BY ca2.customer_id, ca2.deleted, ca2.created DESC
LIMIT 1
) AS address_postcode
, ( SELECT ca3.resident_status
FROM cl_customer_address ca3
WHERE ca3.customer_id = a.customer_id
AND ca3.deleted = '0000-00-00 00:00:00'
ORDER BY ca3.customer_id, ca3.deleted, ca3.created DESC
LIMIT 1
) AS resident_status
FROM cl_application a
LEFT
JOIN cl_application_business_details bd
ON bd.id = a.id
LEFT
JOIN cl_customer c
ON c.id = a.customer_id
AND c.deleted IS NOT NULL
LEFT
JOIN cl_application_packager ap
ON ap.application_id = a.id
AND ap.status != 'Declined'
AND ap.deleted = '0000-00-00 00:00:00'
WHERE a.deleted = '0000-00-00 00:00:00'
GROUP BY a.created, a.id
ORDER BY a.created, a.id
SELECT 列表中的那些相关子查询将针对查询返回的每一行执行,所以这会很昂贵。
现在我们想看看是否可以在c1_application
表上获得一个索引,这将帮助我们避免Using filesort
操作(以满足 ORDER BY 和 GROUP BY 子句。)
CREATE INDEX c1_application_IX2 ON c1_application
(deleted, created, id)
;
查询依赖于 GROUP BY 行为的 MySQL 特定扩展,不会由于 SELECT 列表中未出现在 GROUP BY 中的非聚合而引发错误。如果c1_customer
或c1_application_packager
中有多个“匹配”行,则从 GROUP BY 操作返回的行是不确定的。
不保证这些更改会对性能产生积极影响。 (性能可能会差很多。)
再次,运行 EXPLAIN 以查看执行计划,并从那里进行调整。
下一个大问题是那些相关的子查询。为了获得良好的性能,必须有一个合适的索引可用。 (上面已经指定了一个合适的覆盖指数的建议。)
下一个步骤是从 SELECT 列表中消除相关子查询,如果返回 c1_customer_address
中的 id
列作为获取 address_postcode
和 resident_status
的方法,则第一个相关子查询可能是被淘汰了。
跟进
删除相关子查询...添加一个内联视图lc
以从 c1_customer_address 获取最新创建日期(对于每个 customer_id),并另一个连接到 c1_customer_address 以检索具有该最新创建日期的行(对于每个客户) .
lc
内联视图引入了“派生表”,这对于大型集合来说可能代价高昂,但这可能比使用相关子查询更快。
SELECT a.id
, c.name_first
, c.name_last
, a.created AS app_created
, a.edited
, a.comments AS custcomment
, c.name_company
, a.loan_amount
, a.product_type
, ap.packager AS placed_with
, a.introducer_id
, a.status
, c.contact_number
, c.email
, a.database
, bd.legal_structure
, ca.id AS address_id
, ca.address_postcode
, ca.resident_status
FROM cl_application a
LEFT
JOIN cl_application_business_details bd
ON bd.id = a.id
LEFT
JOIN cl_customer c
ON c.id = a.customer_id
AND c.deleted IS NOT NULL
LEFT
JOIN cl_application_packager ap
ON ap.application_id = a.id
AND ap.status != 'Declined'
AND ap.deleted = '0000-00-00 00:00:00'
LEFT
JOIN ( SELECT ca1.customer_id
, MAX(ca1.created) AS latest_created
FROM cl_customer_address ca1
WHERE ca1.deleted = '0000-00-00 00:00:00'
GROUP BY ca1.customer_id
) lc
ON lc.customer_id = a.customer_id
LEFT
JOIN cl_customer_address ca
ON ca.customer_id = lc.customer_id
AND ca.created = lc.latest_created
AND ca.deleted = '0000-00-00 00:00:00'
WHERE a.deleted = '0000-00-00 00:00:00'
GROUP BY a.created, a.id
ORDER BY a.created, a.id
【讨论】:
非常感谢您的彻底回复和建议。执行上述操作后,查询时间降至 0.78 秒 - 仍然比我想要的要慢一些,但我今天将尝试进一步的调整和索引。 我有兴趣检查修改后查询的 EXPLAIN 输出。我们想看看 MySQL 是否避免“使用文件排序”操作,并使用索引来满足 GROUP BY/ORDER BY 操作。我怀疑 SELECT 列表中的相关子查询是最大的性能瓶颈。 我添加了一个 FOLLOWUP,删除了相关的子查询,并使用内联视图获取每个customer_id
的“最新”created
(来自 c1_customer_address
),以及另一个加入到同一表来获取行(对于那个customer_id
和最新的created
。)如果给定的customer_id
有两行或多行具有相同的最新created
日期,则将返回所有这些行。但是 GROUP BY 操作会将其减少到一行。 (返回哪些行是不确定的。)
另一个性能问题可能是列的数据类型。具体来说,使用“文本”数据类型。例如,我不希望 address_postcode
超过 12 个字符。即使它可能是 80 个字符,您可能最好使用 VARCHAR(80) 作为数据类型。与resident status
类似。那会有多少个字符?确实,使用较大的 VARCHAR,您可能会遇到对行最大长度的限制。当我真正想要或需要行外存储时,我会保留使用“文本”数据类型。
感谢您再次回复此问题 - 我尝试了您建议的后续 SQL,并且速度更快一些 - 我已经更新了原始帖子,其中包含指向修改后查询的解释输出的链接。
以上是关于mysql 查询 - 不能很好地扩展,15k 记录很慢的主要内容,如果未能解决你的问题,请参考以下文章
python 子进程不能很好地与 gsutil 复制/移动命令一起使用