在 INTARRAY 列中更快地搜索

Posted 2023-03-31

技术标签:

【中文标题】在 INTARRAY 列中更快地搜索【英文标题】：Faster search in INTARRAY column 【发布时间】：2016-01-12 10:09:33 【问题描述】：

我的表格大约有 300 000 行，INT[] 列类型

每个数组包含大约 2000 个元素

我为这个数组列创建了索引

create index index_name ON table_name USING GIN (column_name)

然后运行查询：

SELECT COUNT(*)
FROM table_name 
WHERE
column_name@> ARRAY[1777]

这个查询运行很慢Execution time: 66886.132 ms 和EXPLAIN ANALYZE 显示，不使用GIN 索引，只使用Seq Scan 索引。

为什么不使用 Postgres GIN 索引和主要目的地：如何尽可能快地运行上述查询？

编辑

这是来自explain (analyze, verbose) 的上述查询的结果

Aggregate  (cost=10000024724.75..10000024724.76 rows=1 width=0) (actual time=61087.513..61087.513 rows=1 loops=1)
  Output: count(*)
  ->  Seq Scan on public.users  (cost=10000000000.00..10000024724.00 rows=300 width=0) (actual time=12104.651..61087.500 rows=5 loops=1)
        Output: id, email, pass, nick, reg_dt, reg_ip, gender, curr_location, about, followed_tag_ids, avatar_img_ext, rep_tag_ids, rep_tag_id_scores, stats, status
        Filter: (users.rep_tag_ids @> '1777'::integer[])
        Rows Removed by Filter: 299995
Planning time: 0.110 ms
Execution time: 61087.564 ms

这是表和索引定义

CREATE TABLE users
(
  id serial PRIMARY KEY,
  rep_tag_ids integer[] DEFAULT ''
  -- other columns here
);

create index users_rep_tag_ids_idx ON users USING GIN (rep_tag_ids);

【问题讨论】：

请编辑您的问题并添加完整执行计划（来自explain (analyze, verbose) @a_horse_with_no_name 好的，请稍等提供表和索引定义。 @Jakub Kania 请看，我添加了有问题的定义首先，运行ANALYZE table_name。然后，为了调试，尝试SET enable_seqscan=off 和在同一个会话中重新运行EXPLAIN ANALYZE....。现在怎么说？ 【参考方案1】：

您应该帮助查询优化器使用索引。如果您还没有 PostgreSQL 的 intarray 扩展，请安装它，然后使用 gin__int_ops 运算符类重新创建索引。

DROP INDEX users_rep_tag_ids_idx;
CREATE INDEX users_rep_tag_ids_idx ON users USING gin (rep_tag_ids gin__int_ops);

【讨论】：

我已经安装了 intarray 扩展，但是哇，gin__int_ops 多么神奇的词！使用索引，现在使用Execution time: 1077.352 ms。非常感谢！ @OTARIKI 这不是一个神奇的词，它在文档中 GIST 默认情况下 GIN 没有。这就是我问你定义的原因。

以上是关于在 INTARRAY 列中更快地搜索的主要内容，如果未能解决你的问题，请参考以下文章

如何在对话框的列中显示结果？

如何更快地在 byte[] 中搜索字节？

es召回大量数据慢

在 Google 表格列中找到第一个空行的更快方法

搜索列表的更快方法？

如何在 Visual Studio 中更快地键入“0”？