在JPA上选择DISTINCT
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了在JPA上选择DISTINCT相关的知识,希望对你有一定的参考价值。
我有一个表用ISO 4217 values货币(有6行,ID,国家,Currency_Name,Alphabetic_code,Numeric_Code,Minor_Unit)。
我需要获取4 most used currencies的一些数据,我的“纯”SQL查询如下:
select distinct currency_name, alphabetic_code, numeric_code
from currency
where ALPHABETIC_CODE IN ('USD','EUR','JPY','GBP')
order by currency_name;
它返回一个包含我需要的数据的4行表。到现在为止还挺好。
现在我必须将它转换为我们的JPA xml文件,问题就开始了。我想要的查询是这样的:
SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode
FROM Currency c
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY')
ORDER BY c.currencyName
对于具有某些货币的每个国家/地区,这将返回一行列表(就好像查询中没有“DISTINCT”)。我正在摸不着原因。所以问题是:
1)如何使此查询返回“纯”SQL给我的内容?
2)为什么这个查询似乎忽略了我的“DISTINCT”条款?这里有一些我不知道的东西,我什么也得不到。发生了什么,我没有得到什么?
编辑:嗯,这更令人讨厌。不知何故,JPA查询按预期工作(返回4行)。我试过这个(因为我需要更多信息):
SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit, c.id
FROM Currency c
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY')
ORDER BY c.currencyName
似乎ID正在弄乱一切,因为从查询中删除它会返回4行表。括号括号是没用的。
顺便说一句,我们正在使用eclipse链接。
您遇到的问题是当您尝试检索列列表时(c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit, c.id)
)
- distinct在select子句中提到的整个列上运行
我相信“id”列对于db表中的每个记录都是唯一的,因此您可以在其他列(c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit)
中获取重复项。
所以在这种情况下,DISTINCT在整行上运行,而不是在特定列上运行。如果要获取唯一名称,请仅选择该列。
如果你想在多个列上运行distinct,你可以做这样的事情,例如使用GROUP BY来使用c.currencyName, c.alphabeticCode
进行查找
SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode,c.id
FROM Currency c
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY') GROUP BY c.currencyName, c.alphabeticCode
ORDER BY c.currencyName
要回答你的问题,你写的JPQL查询就可以了:
SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode
FROM Currency c
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY')
ORDER BY c.currencyName
它应该转换为您期望的SQL语句:
select distinct currency_name, alphabetic_code, numeric_code
from currency
where ALPHABETIC_CODE IN ('USD','EUR','JPY','GBP')
order by currency_name;
现在,正如我在this article中所解释的,根据底层的JPQL或Criteria API查询类型,DISTINCT
在JPA中有两个含义。
标量查询
对于返回标量投影的标量查询,如下面的查询:
List<Integer> publicationYears = entityManager
.createQuery(
"select distinct year(p.createdOn) " +
"from Post p " +
"order by year(p.createdOn)", Integer.class)
.getResultList();
LOGGER.info("Publication years: {}", publicationYears);
应该将DISTINCT
关键字传递给基础SQL语句,因为我们希望数据库引擎在返回结果集之前过滤重复项:
SELECT DISTINCT
extract(YEAR FROM p.created_on) AS col_0_0_
FROM
post p
ORDER BY
extract(YEAR FROM p.created_on)
-- Publication years: [2016, 2018]
实体查询
对于实体查询,DISTINCT
具有不同的含义。
不使用DISTINCT
,查询如下:
List<Post> posts = entityManager
.createQuery(
"select p " +
"from Post p " +
"left join fetch p.comments " +
"where p.title = :title", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.getResultList();
LOGGER.info(
"Fetched the following Post entity identifiers: {}",
posts.stream().map(Post::getId).collect(Collectors.toList())
);
将加入post
和post_comment
表,如下所示:
SELECT p.id AS id1_0_0_,
pc.id AS id1_1_1_,
p.created_on AS created_2_0_0_,
p.title AS title3_0_0_,
pc.post_id AS post_id3_1_1_,
pc.review AS review2_1_1_,
pc.post_id AS post_id3_1_0__
FROM post p
LEFT OUTER JOIN
post_comment pc ON p.id=pc.post_id
WHERE
p.title='High-Performance Java Persistence eBook has been released!'
-- Fetched the following Post entity identifiers: [1, 1]
但是父post
记录在每个关联的post_comment
行的结果集中重复。出于这个原因,List
实体的Post
将包含重复的Post
实体引用。
要消除Post
实体引用,我们需要使用DISTINCT
:
List<Post> posts = entityManager
.createQuery(
"select distinct p " +
"from Post p " +
"left join fetch p.comments " +
"where p.title = :title", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.getResultList();
LOGGER.info(
"Fetched the following Post entity identifiers: {}",
posts.stream().map(Post::getId).collect(Collectors.toList())
);
但是DISTINCT
也被传递给SQL查询,这根本不可取:
SELECT DISTINCT
p.id AS id1_0_0_,
pc.id AS id1_1_1_,
p.created_on AS created_2_0_0_,
p.title AS title3_0_0_,
pc.post_id AS post_id3_1_1_,
pc.review AS review2_1_1_,
pc.post_id AS post_id3_1_0__
FROM post p
LEFT OUTER JOIN
post_comment pc ON p.id=pc.post_id
WHERE
p.title='High-Performance Java Persistence eBook has been released!'
-- Fetched the following Post entity identifiers: [1]
通过将DISTINCT
传递给SQL查询,执行计划将执行额外的排序阶段,这会增加开销而不会带来任何值,因为父子组合总是返回唯一记录,因为子PK列:
Unique (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)
-> Sort (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)
Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review
Sort Method: quicksort Memory: 25kB
-> Hash Right Join (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)
Hash Cond: (pc.post_id = p.id)
-> Seq Scan on post_comment pc (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)
-> Hash (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on post p (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)
Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
Rows Removed by Filter: 3
Planning time: 0.227 ms
Execution time: 0.179 ms
使用HINT_PASS_DISTINCT_THROUGH进行实体查询
要从执行计划中消除排序阶段,我们需要使用HINT_PASS_DISTINCT_THROUGH
JPA查询提示:
List<Post> posts = entityManager
.createQuery(
"select distinct p " +
"from Post p " +
"left join fetch p.comments " +
"where p.title = :title", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false)
.getResultList();
LOGGER.info(
"Fetched the following Post entity identifiers: {}",
posts.stream().map(Post::getId).collect(Collectors.toList())
);
现在,SQL查询将不包含DISTINCT
,但Post
实体引用重复项将被删除:
SELECT
p.id AS id1_0_0_,
pc.id AS id1_1_1_,
p.created_on AS created_2_0_0_,
p.title AS title3_0_0_,
pc.post_id AS post_id3_1_1_,
pc.review AS review2_1_1_,
pc.post_id AS post_id3_1_0__
FROM post p
LEFT OUTER JOIN
post_comment pc ON p.id=pc.post_id
WHERE
p.title='High-Performance Java Persistence eBook has been released!'
-- Fetched the following Post entity identifiers: [1]
并且执行计划将确认我们这次不再有额外的排序阶段:
Hash Right Join (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)
Hash Cond: (pc.post_id = p.id)
-> Seq Scan on post_comment pc (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)
-> Hash (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on post p (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)
Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
Rows Removed by Filter: 3
Planning time: 1.184 ms
Execution time: 0.160 ms
以上是关于在JPA上选择DISTINCT的主要内容,如果未能解决你的问题,请参考以下文章
在 id 上选择 distinct 以返回一行,但能够访问其他列值(rails 关联)
使用 T-SQL 中的 OVER 子句在除一列之外的所有列上选择 DISTINCT
在一列上选择 DISTINCT,返回多个其他列(SQL Server)