在JPA上选择DISTINCT

Posted 2021-03-31

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了在JPA上选择DISTINCT相关的知识，希望对你有一定的参考价值。

我有一个表用ISO 4217 values货币（有6行，ID，国家，Currency_Name，Alphabetic_code，Numeric_Code，Minor_Unit）。

我需要获取4 most used currencies的一些数据，我的“纯”SQL查询如下：

select distinct currency_name, alphabetic_code, numeric_code 
from currency 
where ALPHABETIC_CODE IN ('USD','EUR','JPY','GBP') 
order by currency_name;

它返回一个包含我需要的数据的4行表。到现在为止还挺好。

现在我必须将它转换为我们的JPA xml文件，问题就开始了。我想要的查询是这样的：

SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode
FROM Currency c 
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY') 
ORDER BY c.currencyName

对于具有某些货币的每个国家/地区，这将返回一行列表（就好像查询中没有“DISTINCT”）。我正在摸不着原因。所以问题是：

1）如何使此查询返回“纯”SQL给我的内容？

2）为什么这个查询似乎忽略了我的“DISTINCT”条款？这里有一些我不知道的东西，我什么也得不到。发生了什么，我没有得到什么？

编辑：嗯，这更令人讨厌。不知何故，JPA查询按预期工作（返回4行）。我试过这个（因为我需要更多信息）：

SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit, c.id
FROM Currency c 
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY') 
ORDER BY c.currencyName

似乎ID正在弄乱一切，因为从查询中删除它会返回4行表。括号括号是没用的。

顺便说一句，我们正在使用eclipse链接。

答案

您遇到的问题是当您尝试检索列列表时（c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit, c.id)）

distinct在select子句中提到的整个列上运行

我相信“id”列对于db表中的每个记录都是唯一的，因此您可以在其他列(c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit)中获取重复项。

所以在这种情况下，DISTINCT在整行上运行，而不是在特定列上运行。如果要获取唯一名称，请仅选择该列。

如果你想在多个列上运行distinct，你可以做这样的事情，例如使用GROUP BY来使用c.currencyName, c.alphabeticCode进行查找

SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode,c.id
FROM Currency c 
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY') GROUP BY c.currencyName, c.alphabeticCode
ORDER BY c.currencyName

另一答案

要回答你的问题，你写的JPQL查询就可以了：

SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode
FROM Currency c 
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY') 
ORDER BY c.currencyName

它应该转换为您期望的SQL语句：

select distinct currency_name, alphabetic_code, numeric_code 
from currency 
where ALPHABETIC_CODE IN ('USD','EUR','JPY','GBP') 
order by currency_name;

现在，正如我在this article中所解释的，根据底层的JPQL或Criteria API查询类型，DISTINCT在JPA中有两个含义。

标量查询

对于返回标量投影的标量查询，如下面的查询：

List<Integer> publicationYears = entityManager
.createQuery(
    "select distinct year(p.createdOn) " +
    "from Post p " +
    "order by year(p.createdOn)", Integer.class)
.getResultList();

LOGGER.info("Publication years: {}", publicationYears);

应该将DISTINCT关键字传递给基础SQL语句，因为我们希望数据库引擎在返回结果集之前过滤重复项：

SELECT DISTINCT
    extract(YEAR FROM p.created_on) AS col_0_0_
FROM
    post p
ORDER BY
    extract(YEAR FROM p.created_on)

-- Publication years: [2016, 2018]

实体查询

对于实体查询，DISTINCT具有不同的含义。

不使用DISTINCT，查询如下：

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

将加入post和post_comment表，如下所示：

SELECT p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1, 1]

但是父post记录在每个关联的post_comment行的结果集中重复。出于这个原因，List实体的Post将包含重复的Post实体引用。

要消除Post实体引用，我们需要使用DISTINCT：

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

但是DISTINCT也被传递给SQL查询，这根本不可取：

SELECT DISTINCT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1]

通过将DISTINCT传递给SQL查询，执行计划将执行额外的排序阶段，这会增加开销而不会带来任何值，因为父子组合总是返回唯一记录，因为子PK列：

Unique  (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)
  ->  Sort  (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)
        Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review
        Sort Method: quicksort  Memory: 25kB
        ->  Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)
              Hash Cond: (pc.post_id = p.id)
              ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)
              ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)
                          Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
                          Rows Removed by Filter: 3
Planning time: 0.227 ms
Execution time: 0.179 ms

使用HINT_PASS_DISTINCT_THROUGH进行实体查询

要从执行计划中消除排序阶段，我们需要使用HINT_PASS_DISTINCT_THROUGH JPA查询提示：

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

现在，SQL查询将不包含DISTINCT，但Post实体引用重复项将被删除：

SELECT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1]

并且执行计划将确认我们这次不再有额外的排序阶段：

Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)
  Hash Cond: (pc.post_id = p.id)
  ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)
  ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)
              Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
              Rows Removed by Filter: 3
Planning time: 1.184 ms
Execution time: 0.160 ms

以上是关于在JPA上选择DISTINCT的主要内容，如果未能解决你的问题，请参考以下文章