为啥我在 postgresql 中的视图不使用索引?

Posted

技术标签:

【中文标题】为啥我在 postgresql 中的视图不使用索引?【英文标题】:why does my view in postgresql not use the index?为什么我在 postgresql 中的视图不使用索引? 【发布时间】:2013-08-08 19:48:05 【问题描述】:

我有一个大表(星形目录),其中有一个子集。我实现了子集 作为两个表的联合,我使用交叉索引。

问题是视图中的查询似乎没有使用索引,时间与扫描表的时间相同。

对大表的查询很快:

select count(*) from ucac4 where rnm in (select ucac4_rnm from grid_catalog limit 5);
count 
-------
     5
(1 row)

Time: 12.132 ms

针对视图的查询不会很快进行,尽管我希望它会。

select count(*) from grid_catalog_view where ident in (select ucac4_rnm from grid_catalog limit 5);
count 
-------
     5
(1 row)

Time: 1056237.045 ms

对这个查询的解释:

 Aggregate  (cost=23175810.51..23175810.52 rows=1 width=0)
   ->  Hash Join  (cost=23081888.41..23172893.67 rows=1166734 width=0)
         Hash Cond: (ucac4.rnm = public.grid_catalog.ucac4_rnm)
         ->  Unique  (cost=23081888.17..23140224.87 rows=2333468 width=44)
               ->  Sort  (cost=23081888.17..23087721.84 rows=2333468 width=44)
                     Sort Key: ucac4.ra, ucac4."dec", ucac4.pmrac, ucac4.pmdc, ucac4.rnm, ucac4.nest4, ucac4.nest6, ucac4.nest7, public.grid_catalog.subset
                     ->  Append  (cost=63349.87..22763295.24 rows=2333468 width=44)
                           ->  Hash Join  (cost=63349.87..22738772.75 rows=2333467 width=44)
                                 Hash Cond: (ucac4.rnm = public.grid_catalog.ucac4_rnm)
                                 ->  Seq Scan on ucac4  (cost=0.00..16394129.04 rows=455124304 width=40)
                                 ->  Hash  (cost=34048.69..34048.69 rows=2344094 width=8)
                                       ->  Seq Scan on grid_catalog  (cost=0.00..34048.69 rows=2344094 width=8)
                                             Filter: (petrov_prikey IS NULL)
                           ->  Hash Join  (cost=415.51..1187.80 rows=1 width=36)
                                 Hash Cond: (petrov.prikey = public.grid_catalog.petrov_prikey)
                                 ->  Seq Scan on petrov  (cost=0.00..709.15 rows=7215 width=32)
                                 ->  Hash  (cost=282.08..282.08 rows=10675 width=8)
                                       ->  Index Scan using grid_catalog_petrov_prikey_idx on grid_catalog  (cost=0.00..282.08 row
s=10675 width=8)
         ->  Hash  (cost=0.18..0.18 rows=5 width=4)
               ->  HashAggregate  (cost=0.13..0.18 rows=5 width=4)
                     ->  Limit  (cost=0.00..0.07 rows=5 width=4)
                           ->  Seq Scan on grid_catalog  (cost=0.00..34048.69 rows=2354769 width=4)
(22 rows)

解释分析(评论中的请求)是:

 Aggregate  (cost=23175810.51..23175810.52 rows=1 width=0) (actual time=1625067.627..1625067.628 rows=1 loops=1)
   ->  Hash Join  (cost=23081888.41..23172893.67 rows=1166734 width=0) (actual time=1621395.200..1625067.618 rows=5 loops=1)
         Hash Cond: (ucac4.rnm = public.grid_catalog.ucac4_rnm)
         ->  Unique  (cost=23081888.17..23140224.87 rows=2333468 width=44) (actual time=1620897.932..1624102.849 rows=1597359 loops
=1)
               ->  Sort  (cost=23081888.17..23087721.84 rows=2333468 width=44) (actual time=1620897.928..1622191.358 rows=1597359 l
oops=1)
                     Sort Key: ucac4.ra, ucac4."dec", ucac4.pmrac, ucac4.pmdc, ucac4.rnm, ucac4.nest4, ucac4.nest6, ucac4.nest7, pu
blic.grid_catalog.subset
                     Sort Method:  external merge  Disk: 87536kB
                     ->  Append  (cost=63349.87..22763295.24 rows=2333468 width=44) (actual time=890293.619..1613769.160 rows=15973
59 loops=1)
                           ->  Hash Join  (cost=63349.87..22738772.75 rows=2333467 width=44) (actual time=890293.617..1611550.313 r
ows=1590144 loops=1)
                                 Hash Cond: (ucac4.rnm = public.grid_catalog.ucac4_rnm)
                                 ->  Seq Scan on ucac4  (cost=0.00..16394129.04 rows=455124304 width=40) (actual time=886086.630..1
359934.589 rows=113780093 loops=1)
                                 ->  Hash  (cost=34048.69..34048.69 rows=2344094 width=8) (actual time=4203.785..4203.785 rows=1590
144 loops=1)
                                       ->  Seq Scan on grid_catalog  (cost=0.00..34048.69 rows=2344094 width=8) (actual time=0.014.
.2813.031 rows=1590144 loops=1)
                                             Filter: (petrov_prikey IS NULL)
                           ->  Hash Join  (cost=415.51..1187.80 rows=1 width=36) (actual time=101.604..165.749 rows=7215 loops=1)
                                 Hash Cond: (petrov.prikey = public.grid_catalog.petrov_prikey)
                                 ->  Seq Scan on petrov  (cost=0.00..709.15 rows=7215 width=32) (actual time=58.280..108.043 rows=7
215 loops=1)
                                 ->  Hash  (cost=282.08..282.08 rows=10675 width=8) (actual time=43.276..43.276 rows=7215 loops=1)
                                       ->  Index Scan using grid_catalog_petrov_prikey_idx on grid_catalog  (cost=0.00..282.08 rows
=10675 width=8) (actual time=19.387..37.533 rows=7215 loops=1)
         ->  Hash  (cost=0.18..0.18 rows=5 width=4) (actual time=0.035..0.035 rows=5 loops=1)
               ->  HashAggregate  (cost=0.13..0.18 rows=5 width=4) (actual time=0.026..0.030 rows=5 loops=1)
                     ->  Limit  (cost=0.00..0.07 rows=5 width=4) (actual time=0.009..0.017 rows=5 loops=1)
                           ->  Seq Scan on grid_catalog  (cost=0.00..34048.69 rows=2354769 width=4) (actual time=0.007..0.009 rows=
5 loops=1)
 Total runtime: 1625108.504 ms
(24 rows)

Time: 1625466.830 ms

要查看浏览视图的时间:

select count(*) from grid_catalog_view;
  count  
---------
 1597359
(1 row)

Time: 1033732.786 ms

我的观点定义为:

PS1=# \d grid_catalog_view
    View "public.grid_catalog_view"
 Column |       Type       | Modifiers 
--------+------------------+-----------
 ra     | double precision | 
 dec    | double precision | 
 pmrac  | integer          | 
 pmdc   | integer          | 
 ident  | integer          | 
 nest4  | integer          | 
 nest6  | integer          | 
 nest7  | integer          | 
 subset | integer          | 
View definition:
         SELECT ucac4.ra, ucac4."dec", ucac4.pmrac, ucac4.pmdc, ucac4.rnm AS ident, ucac4.nest4, ucac4.nest6, ucac4.nest7, grid_catalog.subset
           FROM ucac4, grid_catalog
          WHERE ucac4.rnm = grid_catalog.ucac4_rnm AND grid_catalog.petrov_prikey IS NULL
UNION 
         SELECT petrov.ra, petrov."dec", 0 AS pmrac, 0 AS pmdc, grid_catalog.petrov_prikey AS ident, petrov.nest4, petrov.nest6, petrov.nest7, grid_catalog.subset
           FROM petrov, grid_catalog
          WHERE petrov.prikey = grid_catalog.petrov_prikey AND grid_catalog.ucac4_rnm IS NULL;

大表定义为:

PS1=# \d ucac4
          Table "public.ucac4"
  Column  |       Type       | Modifiers 
----------+------------------+-----------
 radi     | bigint           | 
 spdi     | bigint           | 
 magm     | smallint         | 
 maga     | smallint         | 
 sigmag   | smallint         | 
 objt     | smallint         | 
 cdf      | smallint         |
... deleted entries not of relavance ...
 ra       | double precision | 
 dec      | double precision | 
 x        | double precision | 
 y        | double precision | 
 z        | double precision | 
 nest4    | integer          | 
 nest6    | integer          | 
 nest7    | integer          | 
Indexes:
    "ucac4_pkey" PRIMARY KEY, btree (rnm)
    "q3c_ucac4_idx" btree (q3c_ang2ipix(ra, "dec")) CLUSTER
    "ucac4_nest4_idx" btree (nest4)
    "ucac4_nest6_idx" btree (nest6)
    "ucac4_nest7_idx" btree (nest7)
Referenced by:
    TABLE "grid_catalog" CONSTRAINT "grid_catalog_ucac4_rnm_fkey" FOREIGN KEY (ucac4_rnm) REFERENCES ucac4(rnm)

知道为什么我的索引似乎没有被使用吗?

【问题讨论】:

什么索引?我没有看到任何索引定义。 我忘记包含表模式。固定。 explain analyze 更可取,因此可以看到任何行数错误估计等。 【参考方案1】:

据我所知,这是 postgres 的一个限制 - 很难避免以这种方式扫描联合上的整个表。

见:

https://www.postgresql-archive.org/Poor-plan-when-joining-against-a-union-containing-a-join-td5747690.html

https://www.postgresql-archive.org/Pushing-IN-subquery-down-through-UNION-ALL-td3398684.html

也可能相关

https://dba.stackexchange.com/questions/47572/in-postgresql-9-3-union-view-with-where-clause-not-taken-into-account

基本上 - 我想你需要重新审视你的视图定义!很抱歉没有明确的解决方案。

【讨论】:

前 2 个链接已损坏 修复了链接 - 该站点已将通用 nabble 域名替换为自定义域名,如更新的链接中所示。

以上是关于为啥我在 postgresql 中的视图不使用索引?的主要内容,如果未能解决你的问题,请参考以下文章

为啥此查询在 PostgreSQL 中不使用仅索引扫描?

为啥 PostgreSQL 不使用三元索引

为啥即使使用仅索引扫描 PostgresQL 计数也如此缓慢

为啥我的 PostgreSQL 数组索引没有被使用(Rails 4)?

为啥 Postgresql 使用过滤器而不是索引?

为啥不允许在索引视图中使用 UNION