h2 如何在 Join 中选择正确/错误的索引

Posted

技术标签:

【中文标题】h2 如何在 Join 中选择正确/错误的索引【英文标题】:how h2 chooses right/wrong index in Join 【发布时间】:2018-04-27 10:18:49 【问题描述】:

我遇到了 named query in Java 的问题,但问题出在 H2 中。

我认为ANALYZE 是我的解决方案,可以解决我的问题。它在我的开发机器上本地完成。在客户端,它确实使情况变得更糟。

场景: 我有一个数据版本为 105 的 H2 数据库。导入更多数据后,它变为版本 106。

表格看起来像

查询(获取具有给定 guid、本地和最高版本的行):

SELECT tdo.TECDOC_GUID as guid, tdo.TECDOC_LOCALE as locale , tdo.TECDOC_VERSION as version, tdo.DATA as data
FROM TECDOC_OBJECTS tdo
LEFT OUTER JOIN TECDOC_OBJECTS tdo1
ON (
    tdo.TECDOC_GUID = tdo1.TECDOC_GUID AND 
    tdo.TECDOC_LOCALE = tdo1.TECDOC_LOCALE AND 
    tdo.TECDOC_VERSION < tdo1.TECDOC_VERSION)
WHERE tdo1.id IS NULL 
AND tdo.TECDOC_GUID in ('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0')
AND tdo.TECDOC_LOCALE = 'de';

在我运行ANALYZE 命令之前执行计划(scanCount 真的很低):

SELECT
    TDO.TECDOC_GUID AS GUID,
    TDO.TECDOC_LOCALE AS LOCALE,
    TDO.TECDOC_VERSION AS VERSION,
    TDO.DATA AS DATA
FROM PUBLIC.TECDOC_OBJECTS TDO
    /* PUBLIC.IDX_TECDOC_GUID: TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0') */
    /* WHERE (TDO.TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'))
        AND (TDO.TECDOC_LOCALE = 'de')
    */
    /* scanCount: 19 */
LEFT OUTER JOIN PUBLIC.TECDOC_OBJECTS TDO1
    /* PUBLIC.IDX_GUID_LOCALE_VERSION: TECDOC_GUID = TDO.TECDOC_GUID
        AND TECDOC_LOCALE = TDO.TECDOC_LOCALE
        AND TECDOC_VERSION > TDO.TECDOC_VERSION
     */
    ON (TDO.TECDOC_VERSION < TDO1.TECDOC_VERSION)
    AND ((TDO.TECDOC_GUID = TDO1.TECDOC_GUID)
    AND (TDO.TECDOC_LOCALE = TDO1.TECDOC_LOCALE))
    /* scanCount: 4 */
WHERE (TDO.TECDOC_LOCALE = 'de')
    AND ((TDO.TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'))
    AND (TDO1.ID IS NULL))
/*
total: 37
TECDOC_OBJECTS.IDX_GUID_LOCALE_VERSION read: 6 (16%)
TECDOC_OBJECTS.IDX_TECDOC_GUID read: 8 (21%)
TECDOC_OBJECTS.TECDOC_OBJECTS_DATA read: 23 (62%)
*/

SELECT
    TDO.TECDOC_GUID AS GUID,
    TDO.TECDOC_LOCALE AS LOCALE,
    TDO.TECDOC_VERSION AS VERSION,
    TDO.DATA AS DATA
FROM PUBLIC.TECDOC_OBJECTS TDO
    /* PUBLIC.IDX_GUID_LOCALE_VERSION: TECDOC_LOCALE = 'de'
        AND TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0')
     */
    /* WHERE (TDO.TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'))
        AND (TDO.TECDOC_LOCALE = 'de')
    */
    /* scanCount: 287385 */
LEFT OUTER JOIN PUBLIC.TECDOC_OBJECTS TDO1
    /* PUBLIC.IDX_GUID_LOCALE_VERSION: TECDOC_GUID = TDO.TECDOC_GUID
        AND TECDOC_LOCALE = TDO.TECDOC_LOCALE
        AND TECDOC_VERSION > TDO.TECDOC_VERSION
     */
    ON (TDO.TECDOC_VERSION < TDO1.TECDOC_VERSION)
    AND ((TDO.TECDOC_GUID = TDO1.TECDOC_GUID)
    AND (TDO.TECDOC_LOCALE = TDO1.TECDOC_LOCALE))
    /* scanCount: 4 */
WHERE (TDO.TECDOC_LOCALE = 'de')
    AND ((TDO.TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'))
    AND (TDO1.ID IS NULL))
/*
total: 11891
TECDOC_OBJECTS.IDX_GUID_LOCALE_VERSION read: 11884 (99%)
TECDOC_OBJECTS.TECDOC_OBJECTS_DATA read: 7 (0%)
*/

在我运行ANALYZE 命令后执行计划(scanCount 真的很高):

SELECT
    TDO.TECDOC_GUID AS GUID,
    TDO.TECDOC_LOCALE AS LOCALE,
    TDO.TECDOC_VERSION AS VERSION,
    TDO.DATA AS DATA
FROM PUBLIC.TECDOC_OBJECTS TDO
    /* PUBLIC.IDX_GUID_LOCALE_VERSION: TECDOC_LOCALE = 'de'
        AND TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0')
     */
    /* WHERE (TDO.TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'))
        AND (TDO.TECDOC_LOCALE = 'de')
    */
    /* scanCount: 287385 */
LEFT OUTER JOIN PUBLIC.TECDOC_OBJECTS TDO1
    /* PUBLIC.IDX_GUID_LOCALE_VERSION: TECDOC_GUID = TDO.TECDOC_GUID
        AND TECDOC_LOCALE = TDO.TECDOC_LOCALE
        AND TECDOC_VERSION > TDO.TECDOC_VERSION
     */
    ON (TDO.TECDOC_VERSION < TDO1.TECDOC_VERSION)
    AND ((TDO.TECDOC_GUID = TDO1.TECDOC_GUID)
    AND (TDO.TECDOC_LOCALE = TDO1.TECDOC_LOCALE))
    /* scanCount: 4 */
WHERE (TDO.TECDOC_LOCALE = 'de')
    AND ((TDO.TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'))
    AND (TDO1.ID IS NULL))
/*
total: 11891
TECDOC_OBJECTS.IDX_GUID_LOCALE_VERSION read: 11884 (99%)
TECDOC_OBJECTS.TECDOC_OBJECTS_DATA read: 7 (0%)
*/

但在我的开发笔记本电脑上,ANALYZE 之后查询仍然很快。不知何故,H2 使用了错误的索引(因为根据文档,它每次连接只能使用一个索引)。

有人有什么建议吗?

【问题讨论】:

你能发布CREATE TABLECREATE INDEX 命令吗?我想查看主键和索引定义。另外,您是否有指向同一张表的自引用外键(看起来不像)? 嘿@TheImpaler,PK 和 INDEX 可以在我提供的图像上看到。希望这已经足够了。表中没有外键。 【参考方案1】:

您的查询并不复杂。我认为它的关键方面在于where 条件。

WHERE tdo1.id IS NULL 
  AND tdo.TECDOC_GUID in ('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6',
    'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0')
  AND tdo.TECDOC_LOCALE = 'de';

由于某种原因,H2 以错误的方式使用索引。我会尝试改写这个条件,看看 H2 的 SQL 优化器是如何解决这个问题的。

例如,您可以尝试选项#1

SELECT
    ... -- columns, FROM, and OUTER JOIN here
  WHERE tdo.TECDOC_GUID = 'GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6'
    AND tdo.TECDOC_LOCALE = 'de'
     OR tdo.TECDOC_GUID = 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'
    AND tdo.TECDOC_LOCALE = 'de'
    AND tdo1.id IS NULL 

或者您可以将查询一分为二,以确保它使用索引,如选项#2

SELECT
    ... -- columns, FROM, and OUTER JOIN here
  WHERE tdo.TECDOC_GUID = 'GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6'
    AND tdo.TECDOC_LOCALE = 'de'
    AND tdo1.id IS NULL 
UNION ALL
SELECT
    ... -- columns, FROM, and OUTER JOIN here
  WHERE tdo.TECDOC_GUID = 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'
    AND tdo.TECDOC_LOCALE = 'de'
    AND tdo1.id IS NULL 

这样,您仅在搜索时使用 equality。这对于 SQL 优化器来说更容易理解。注意union all 的使用比union 便宜。

【讨论】:

【参考方案2】:

以某种方式解决问题的是我使用了

USE INDEX

指定它应该使用哪个索引。

这里是强制使用某个索引的查询(或索引提示http://www.h2database.com/html/performance.html#database_performance_tuning)。

SELECT tdo.TECDOC_GUID as guid, tdo.TECDOC_LOCALE as locale , tdo.TECDOC_VERSION as version, tdo.DATA as data
FROM TECDOC_OBJECTS tdo USE INDEX (IDX_TECDOC_GUID)
LEFT OUTER JOIN TECDOC_OBJECTS tdo1
ON (
    tdo.TECDOC_GUID = tdo1.TECDOC_GUID AND 
    tdo.TECDOC_LOCALE = tdo1.TECDOC_LOCALE AND 
    tdo.TECDOC_VERSION < tdo1.TECDOC_VERSION)
WHERE tdo1.id IS NULL 
AND tdo.TECDOC_GUID in ('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0')
AND tdo.TECDOC_LOCALE = 'de';

这将解决这个问题。如果你将它与 Java 和 Hibernate 一起使用,请注意 H2 的解析器在 1.4.194 之前的版本中无法理解 USE INDEX。我遇到了问题,版本 1.4.194 出现了一些其他问题。我删除了我表中的一些组合索引。

干杯

【讨论】:

以上是关于h2 如何在 Join 中选择正确/错误的索引的主要内容,如果未能解决你的问题,请参考以下文章

存储库模式 - 如何正确处理 JOIN 和复杂查询?

取消选择数据表行后如何正确删除数组中的索引或值?

如何干涉MySQL优化器使用hash join?

H2:如何判断索引是不是存在?

如何从 Grails 中正确删除 H2

您如何优化这个复杂的 sql 查询,然后选择正确的表索引