使 JPQL/QueryDSL 不会产生可怕的查询
Posted
技术标签:
【中文标题】使 JPQL/QueryDSL 不会产生可怕的查询【英文标题】:Make JPQL/QueryDSL not generate terrible queries 【发布时间】:2021-11-15 07:41:18 【问题描述】:我使用 QueryDSL 4.4.0 和 Hibernate 5.4.32 来查询一个简单的博客平台到一个 PostgreSQL 数据库。我的问题是 JPQL 和扩展的 QueryDSL 坚持生成真正令人震惊的糟糕查询。我想知道是否有办法让它不这样做。我宁愿不必去原生查询,因为查询已经在生成。
我基本上有 3 个实体:
@Entity
@Table(indexes = ... )
public class Note
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@NotNull
private UUID id;
@ManyToMany(fetch = FetchType.EAGER)
private List<Keyword> keywords;
...
@Entity
@Table(indexes = @Index(name = "keyword_parent", columnList = "parent_id"), ... )
public class Keyword
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@NotNull
private UUID id;
@EqualsAndHashCode.Exclude
@ManyToOne(fetch = FetchType.LAZY)
private Keyword parent;
@ManyToMany(fetch = FetchType.LAZY)
private List<Keyword> implies = new ArrayList<>();
@OneToMany(fetch = FetchType.LAZY, mappedBy = "parent", orphanRemoval = false)
private List<Keyword> children = new ArrayList<>();
...
@Entity
@Table(indexes = @Index(columnList = "child_id"), @Index(columnList = "parent_id"),
@Index(columnList = "child_id,parent_id", unique = true), @Index(columnList = "ref"), ... )
@IdClass(KeywordCacheId.class)
@Where(clause = "ref > 0")
public class KeywordCache implements Serializable
/**
*
*/
private static final long serialVersionUID = 1L;
@NotNull
@ManyToOne(fetch = FetchType.EAGER)
@Id
private Keyword child;
@NotNull
@ManyToOne(fetch = FetchType.EAGER)
@Id
private Keyword parent;
private int ref;
...
public class KeywordCacheId implements Serializable
/**
*
*/
private static final long serialVersionUID = 1L;
private UUID child;
private UUID parent;
// equals + hashCode
(简化为只包含主要结构)
一个笔记有许多关键字。关键词有层次关系+辅助关系。这种关系过于复杂,无法在 SQL 中处理,因此构建了一个缓存来反映两个关键字是否存在关系。
我有 7680 条笔记、1308 个关键词、39k 条笔记-关键词关系、12 个关键词-关键词关系,以及 3002 条关系的计算缓存。换句话说,一个小型数据库。
我想查找所有包含与给定关键字 ID 列表相关的关键字的笔记。
我的第一次尝试是
private JPAQuery<Note> addFilter(JPAQuery<Note> query, List<String> filter)
for (String f : filter)
UUID id = UUID.fromString(f);
String variable = id.toString().replaceAll("-", "");
QKeywordCache cache = new QKeywordCache("kc_" + variable);
query.from(cache);
query.where(cache.child.in(QNote.note.keywords));
query.where(cache.parent.id.eq(id));
return query;
public Page<Note> find(List<String> filter, Pageable page)
JPAQuery<Note> query = new JPAQuery<>(entityManager);
query.from(QNote.note);
query.select(QNote.note);
query.distinct();
query = addFilter(query, filter);
query.offset(page.getOffset());
query.limit(page.getPageSize());
QueryResults<Note> data = query.fetchResults();
return new PageImpl<>(data.getResults(), page, data.getTotal());
这会产生有意义的 JPQL,它会被转换为头上疯狂的 SQL:
select distinct note
from Note note, KeywordCache kc_6205f3b41e354d63909ef253866371b1
where kc_6205f3b41e354d63909ef253866371b1.child member of note.keywords and kc_6205f3b41e354d63909ef253866371b1.parent.id = ?1
select
count(distinct note0_.id) as col_0_0_
from
Note note0_
cross join KeywordCache keywordcac1_
where
( keywordcac1_.ref > 0)
and (keywordcac1_.child_id in (
select
keywords2_.keywords_id
from
Note_Keyword keywords2_
where
note0_.id = keywords2_.Note_id))
and keywordcac1_.parent_id = '98c9201c-a395-4ac4-9348-ea89e740653b'
Aggregate (cost=10229575.18..10229575.19 rows=1 width=8)
-> Nested Loop (cost=4.30..10229542.12 rows=13222 width=16)
Join Filter: (SubPlan 1)
-> Seq Scan on note note0_ (cost=0.00..206.15 rows=8815 width=16)
-> Materialize (cost=4.30..13.31 rows=3 width=16)
-> Bitmap Heap Scan on keywordcache keywordcac1_ (cost=4.30..13.29 rows=3 width=16)
Recheck Cond: (parent_id = '98c9201c-a395-4ac4-9348-ea89e740653b'::uuid)
Filter: (ref > 0)
-> Bitmap Index Scan on idx1in649xpbjw4aeix3574irbne (cost=0.00..4.30 rows=3 width=0)
Index Cond: (parent_id = '98c9201c-a395-4ac4-9348-ea89e740653b'::uuid)
SubPlan 1
-> Seq Scan on note_keyword keywords2_ (cost=0.00..773.59 rows=5 width=16)
Filter: (note0_.id = note_id)
计数是因为分页。这种疯狂的“输入”结构使得这个查询大约需要 150 秒。
将过滤方法替换为
private JPAQuery<Note> addFilter(JPAQuery<Note> query, List<String> filter)
for (String f : filter)
UUID id = UUID.fromString(f);
String variable = id.toString().replaceAll("-", "");
QKeywordCache cache = new QKeywordCache("kc_" + variable);
query.from(cache);
query.where(QNote.note.keywords.any().eq(cache.child));
// query.where(cache.child.in(QNote.note.keywords));
query.where(cache.parent.id.eq(id));
return query;
我的 JPQL 稍微差一点,但由于不必要的子选择,SQL 看起来过于复杂:
select distinct note
from Note note, KeywordCache kc_6205f3b41e354d63909ef253866371b1
where exists (select 1
from note.keywords as note_keywords_0
where note_keywords_0 = kc_6205f3b41e354d63909ef253866371b1.child) and kc_6205f3b41e354d63909ef253866371b1.parent.id = ?1
select
count(distinct note0_.id) as col_0_0_
from
Note note0_
cross join KeywordCache keywordcac1_
where
( keywordcac1_.ref > 0)
and (exists (
select
1
from
Note_Keyword keywords2_,
Keyword keyword3_
where
note0_.id = keywords2_.Note_id
and keywords2_.keywords_id = keyword3_.id
and keyword3_.id = keywordcac1_.child_id))
and keywordcac1_.parent_id = '98c9201c-a395-4ac4-9348-ea89e740653b'
Aggregate (cost=3212.04..3212.05 rows=1 width=8)
-> Hash Semi Join (cost=1459.43..3211.99 rows=18 width=16)
Hash Cond: ((note0_.id = keywords2_.note_id) AND (keywordcac1_.child_id = keywords2_.keywords_id))
-> Nested Loop (cost=4.30..550.01 rows=26445 width=32)
-> Seq Scan on note note0_ (cost=0.00..206.15 rows=8815 width=16)
-> Materialize (cost=4.30..13.31 rows=3 width=16)
-> Bitmap Heap Scan on keywordcache keywordcac1_ (cost=4.30..13.29 rows=3 width=16)
Recheck Cond: (parent_id = '98c9201c-a395-4ac4-9348-ea89e740653b'::uuid)
Filter: (ref > 0)
-> Bitmap Index Scan on idx1in649xpbjw4aeix3574irbne (cost=0.00..4.30 rows=3 width=0)
Index Cond: (parent_id = '98c9201c-a395-4ac4-9348-ea89e740653b'::uuid)
-> Hash (cost=871.22..871.22 rows=38927 width=48)
-> Hash Join (cost=92.43..871.22 rows=38927 width=48)
Hash Cond: (keywords2_.keywords_id = keyword3_.id)
-> Seq Scan on note_keyword keywords2_ (cost=0.00..676.27 rows=38927 width=32)
-> Hash (cost=76.08..76.08 rows=1308 width=16)
-> Seq Scan on keyword keyword3_ (cost=0.00..76.08 rows=1308 width=16)
现在,查询可以利用我的索引,大约需要 120 毫秒。不过,它仍然使用完全没有必要的愚蠢的子选择。手动写一个查询,我得到了
select
count(distinct note0_.id) as col_0_0_
from
Note note0_,
Note_Keyword keywords2_,
KeywordCache keywordcac1_
where
keywordcac1_.ref > 0
and note0_.id = keywords2_.Note_id
and keywords2_.keywords_id = keywordcac1_.child_id
and keywordcac1_.parent_id = '98c9201c-a395-4ac4-9348-ea89e740653b'
Aggregate (cost=816.18..816.19 rows=1 width=8)
-> Nested Loop (cost=13.61..815.99 rows=75 width=16)
-> Hash Join (cost=13.33..792.09 rows=75 width=16)
Hash Cond: (keywords2_.keywords_id = keywordcac1_.child_id)
-> Seq Scan on note_keyword keywords2_ (cost=0.00..676.27 rows=38927 width=32)
-> Hash (cost=13.29..13.29 rows=3 width=16)
-> Bitmap Heap Scan on keywordcache keywordcac1_ (cost=4.30..13.29 rows=3 width=16)
Recheck Cond: (parent_id = '98c9201c-a395-4ac4-9348-ea89e740653b'::uuid)
Filter: (ref > 0)
-> Bitmap Index Scan on idx1in649xpbjw4aeix3574irbne (cost=0.00..4.30 rows=3 width=0)
Index Cond: (parent_id = '98c9201c-a395-4ac4-9348-ea89e740653b'::uuid)
-> Index Only Scan using note_pkey on note note0_ (cost=0.29..0.32 rows=1 width=16)
Index Cond: (id = keywords2_.note_id)
此查询需要 30 毫秒。虽然从 120 毫秒到 30 毫秒的改进可能看起来微不足道,但它仍然是 4 倍,这是我的应用程序中的一个中心循环,所以我想保持快速。尤其是因为我可以添加多个关键字(并且预计正常使用列表中有 3-6 个关键字)并计划添加排序,所以子选择必须有效(或不存在)。
那么,有没有办法让 QueryDSL 在第二种情况下生成更好的 JPQL,或者让 JPQL(由 Hibernate 实现)不再像第一种情况那样迷恋容器的狂野“in”子选择?
【问题讨论】:
它的any()
在SQL中引入了子查询,如果不需要就去掉它。将其替换为您自己的子查询或加入。
感谢您的建议。不幸的是,它是必需的,因为关键字是一个集合,所以我不能直接将它绑定到 cache.child。如果我使用包含,这将是自然的方式,我会回到我的第一次尝试,它会产生更好的 JPQL 代码,但会被翻译成糟糕的 SQL。
JPQL 支持关联连接,它可以精确地呈现您想要的底层 SQL。 Querydsl 可以很好地呈现关联连接:.innerJoin(QNote.note.keywords, QKeyword.keyword)
是您正在寻找的语法,它可以在 ON
和 WHERE
子句中过滤。我强烈建议更新到 Querydsl 5.0.0,因为 4.x 仍然使用 Hibernate 4 legacy joins 而不是 JPA 2.1 joins。
感谢您的澄清。我忽略的秘诀是明确加入关键字。它在 SQL 中添加了一个额外的连接,但避免实现子选择超过弥补它。我将研究 QueryDSL 5;我正在使用由 Spring Boot 管理的版本,但他们似乎正在考虑删除 QueryDSL 的版本管理,因此这可能是跳过的好时机。
【参考方案1】:
感谢 Jan-Willem Gmelig Meyling 的评论,我使用带关键字的显式连接使其工作:
private JPAQuery<Note> addFilter(JPAQuery<Note> query, List<String> filter)
for (String f : filter)
UUID id = UUID.fromString(f);
String variable = id.toString().replaceAll("-", "");
QKeywordCache cache = new QKeywordCache("kc_" + variable);
QKeyword keyword = new QKeyword("k_" + variable);
query.from(cache);
query.innerJoin(QNote.note.keywords, keyword);
query.where(keyword.eq(cache.parent));
query.where(cache.parent.id.eq(id));
return query;
这导致查询更接近我手写的内容:
select distinct note
from Note note, KeywordCache kc_6205f3b41e354d63909ef253866371b1
inner join note.keywords as k_6205f3b41e354d63909ef253866371b1
where k_6205f3b41e354d63909ef253866371b1 = kc_6205f3b41e354d63909ef253866371b1.parent and kc_6205f3b41e354d63909ef253866371b1.parent.id = ?1
select
count(distinct note0_.id) as col_0_0_
from
Note note0_
cross join KeywordCache keywordcac1_
inner join Note_Keyword keywords2_ on
note0_.id = keywords2_.Note_id
inner join Keyword keyword3_ on
keywords2_.keywords_id = keyword3_.id
where
( keywordcac1_.ref > 0)
and keyword3_.id = keywordcac1_.parent_id
and keywordcac1_.parent_id ='98c9201c-a395-4ac4-9348-ea89e740653b'
Aggregate (cost=812.06..812.07 rows=1 width=8)
-> Nested Loop (cost=4.87..812.04 rows=6 width=16)
-> Bitmap Heap Scan on keywordcache keywordcac1_ (cost=4.30..13.45 rows=3 width=16)
Recheck Cond: (parent_id = '98c9201c-a395-4ac4-9348-ea89e740653b'::uuid)
Filter: (ref > 0)
-> Bitmap Index Scan on idx1in649xpbjw4aeix3574irbne (cost=0.00..4.30 rows=3 width=0)
Index Cond: (parent_id = '98c9201c-a395-4ac4-9348-ea89e740653b'::uuid)
-> Materialize (cost=0.56..798.52 rows=2 width=32)
-> Nested Loop (cost=0.56..798.51 rows=2 width=32)
-> Index Only Scan using keyword_pkey on keyword keyword3_ (cost=0.28..8.29 rows=1 width=16)
Index Cond: (id = '98c9201c-a395-4ac4-9348-ea89e740653b'::uuid)
-> Nested Loop (cost=0.29..790.19 rows=2 width=32)
-> Seq Scan on note_keyword keywords2_ (cost=0.00..773.59 rows=2 width=32)
Filter: (keywords_id = '98c9201c-a395-4ac4-9348-ea89e740653b'::uuid)
-> Index Only Scan using note_pkey on note note0_ (cost=0.29..8.30 rows=1 width=16)
Index Cond: (id = keywords2_.note_id)
通过避免子选择可以弥补额外的连接,使性能类似于手写查询。
【讨论】:
以上是关于使 JPQL/QueryDSL 不会产生可怕的查询的主要内容,如果未能解决你的问题,请参考以下文章
FBI:勒索软件是可怕的,但另一个骗局正在使受害者付出更多代价