postgres - 避免创建重复的空列

Posted 2023-03-29

技术标签:

【中文标题】postgres - 避免创建重复的空列【英文标题】：postgres - avoid creating duplicate null columns 【发布时间】：2021-12-10 12:24:39 【问题描述】：

我在 Postgres 上有这个表架构：

> \d+ users_types_brands

                   Table "public.users_types_brands"
     Column     |            Type             | Collation | Nullable |                    Default                     | Storage | Stats target | Description 
----------------+-----------------------------+-----------+----------+------------------------------------------------+---------+--------------+-------------
 id             | integer                     |           | not null | nextval('users_types_brands_id_seq'::regclass) | plain   |              | 
 inserted_at    | timestamp without time zone |           |          | now()                                          | plain   |              | 
 updated_at     | timestamp without time zone |           |          | now()                                          | plain   |              | 
 users_types_id | bigint                      |           |          |                                                | plain   |              | 
 brand_id       | bigint                      |           | not null |                                                | plain   |              | 
 tasks_type_id  | integer                     |           |          |                                                | plain   |              | 
Indexes:
    "users_types_brands_pkey" PRIMARY KEY, btree (id)
    "users_types_brands_users_types_id_brand_id_tasks_type_id_index" UNIQUE, btree (users_types_id, brand_id, tasks_type_id)
Foreign-key constraints:
    "users_types_brands_users_types_id_fkey" FOREIGN KEY (users_types_id) REFERENCES users_types(id)
Access method: heap

现在表格是这样的：

my_db=# select * from users_types_brands;
 id |        inserted_at         |         updated_at         | users_types_id | brand_id | tasks_type_id 
----+----------------------------+----------------------------+----------------+----------+---------------
 12 | 2021-10-24 16:43:12.244026 | 2021-10-24 16:43:12.244026 |              2 |      112 |             8
 14 | 2021-10-24 17:03:12.012874 | 2021-10-24 17:03:12.012874 |              2 |      111 |             9
(2 rows)

当然，我不能像这样插入一行：

my_db=# insert into users_types_brands (users_types_id, brand_id, tasks_type_id) values (2, 112, 8);
ERROR:  duplicate key value violates unique constraint "users_types_brands_users_types_id_brand_id_tasks_type_id_index"
DETAIL:  Key (users_types_id, brand_id, tasks_type_id)=(2, 112, 8) already exists.

但我可以多次这样做：

my_db=# insert into users_types_brands (users_types_id, brand_id) values (2, 112);
INSERT 0 1

并获得这个：

my_db=# select * from users_types_brands;
 id |        inserted_at         |         updated_at         | users_types_id | brand_id | tasks_type_id 
----+----------------------------+----------------------------+----------------+----------+---------------
 12 | 2021-10-24 16:43:12.244026 | 2021-10-24 16:43:12.244026 |              2 |      112 |             8
 14 | 2021-10-24 17:03:12.012874 | 2021-10-24 17:03:12.012874 |              2 |      111 |             9
 16 | 2021-10-24 17:15:58.295428 | 2021-10-24 17:15:58.295428 |              2 |      112 |              
 17 | 2021-10-24 17:16:36.99971  | 2021-10-24 17:16:36.99971  |              2 |      112 |              
(4 rows)

现在，根据业务规则，tasks_type_id 可以为 null ????

但我怎样才能避免创建像最后两行一样的重复行？一个空的tasks_type_id 是可以的，但不是两个或更多。

以前有人遇到过这种情况吗？

【问题讨论】：

也许this可以回答你的问题嗯，不完全是。根据文档：

The COALESCE function returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display

。 但我正在寻找的是不创建记录谢谢你，虽然???? 哦，好的。如果您确定其他tasks_type_id 会出现一次，也许您可以使用GROUP BY tasks_type_id 在唯一索引中，您可以使用 coalesce 将空值转换为 -1 或其他一些未使用的有效值以防止重复空值，例如： CREATE UNIQUE INDEX X ON Y ( ( COALESCE( nullable_field, - 1 ) ), other_field ); @ncank 是的，它解决了它。 我怎样才能将其标记为正确？顺便说一句，我是为我的案例这样做的

CREATE UNIQUE INDEX CONCURRENTLY users_types_brands_users_types_id_brand_id_tasks_type_id_index ON users_types_brands (users_types_id, brand_id, COALESCE(tasks_type_id, -1));

谢谢@Elikill58 【参考方案1】：

您可以创建一个Partial Unique Index。它将允许单行具有相同的 users_types_id 和 brand_id 以及 null tasks_type_id，但只有一个。（见Demo）

create unique index tasks_type_id_just_1_unique
    on users_types_brands (users_types_id, brand_id)
  where tasks_type_id is null;

【讨论】：

【参考方案2】：

这个问题有两种基本的解决方案，但都有各自的缺点。

1. 正如 Belayer 所指出的，使用部分索引。缺点是对于非空值，您将需要另一个部分索引，因为这将忽略非空值并仅覆盖具有空值的行。

CREATE UNIQUE INDEX "index_for_nulls" ON "table" ( "field_a", "field_b" ) WHERE "field_c" IS NULL;
CREATE UNIQUE INDEX "index_for_non_nulls" ON "table" ( "field_a", "field_b", "field_c" ) WHERE "field_c" IS NOT NULL;

2.在索引定义中使用 COALESCE 来避免空值。这样索引将覆盖所有行，但如果您不使用索引中定义的确切语句，规划器将不会使用完整索引

CREATE UNIQUE INDEX "index" ON "table" ( "field_a", "field_b", ( COALESCE( "field_c", -1 ) );

【讨论】：

以上是关于postgres - 避免创建重复的空列的主要内容，如果未能解决你的问题，请参考以下文章

如何通过重复计数逻辑处理row_number分区中的空列？

索引服务：获取自定义属性的空列

在 Postgres 中创建表时将空列设置为带时区的时间戳

如何计算一行中的空列？

在 spark 数据框中创建 StructType 的空列

Seaborn FacetGrid 包括已删除级别的空列/行