仅当复合主键尚不存在时，如何批量插入行？ [AWS 红移]

Posted 2023-03-30

技术标签:

【中文标题】仅当复合主键尚不存在时，如何批量插入行？ [AWS 红移]【英文标题】：How can I bulk insert rows only if a compound primary key don't already exist? [AWS Redshift] 【发布时间】：2019-09-09 10:19:09 【问题描述】：

在 Amazon Redshift 中，我尝试从临时表中的表中批量插入值。但是我只想插入表中不存在值组合（主键）的值，以避免添加重复。

表格的 DDL 下方

• clusters_typologies 表（我要插入数据时的表）

create table if not exists clusters.clusters_typologies
(
    cluster_id  BIGINT,
    typology_id BIGINT,
    semantic_id BIGINT,
    primary key (cluster_id, typology_id, semantic_id)
);

使用下面的查询创建临时表，然后正确插入所有字段。

CREATE TEMPORARY TABLE temporary (
  cluster_id   bigint,
  typology_name varchar(100),
  typology_id   bigint,
  semantic_name varchar(100),
  semantic_id   bigint
);

现在当我尝试使用该查询插入时

INSERT INTO clusters.clusters_typologies (cluster_id, typology_id,semantic_id)
    (SELECT temp.cluster_id, temp.typology_id, temp.semantic_id
     FROM temporary temp
     WHERE NOT EXISTS(SELECT 1
                      FROM clusters_typologies
                      where cluster_id = temp.cluster_id
                        and typology_id = temp.typology_id
                        and semantic_id = temp.semantic_id));

我遇到了这个错误，我不知道如何让它工作。

无效操作：由于内部错误，不支持这种类型的关联子查询模式；

任何人都知道如何修复或如何使用复合键在表中插入避免重复的最佳方法。

谢谢。

【问题讨论】：

【参考方案1】：

要更新插入，请遵循本指南 https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-upsert.html

请注意，某些类型的相关子查询在 redshift 中是不允许的 - 这就是您的错误的原因看 https://docs.aws.amazon.com/redshift/latest/dg/r_correlated_subqueries.html

【讨论】：

【参考方案2】：

经过一番尝试，我想出了如何从临时表中插入，并从复合主键中检查以避免重复。

基本上从 @Jon Scott 发送的 AWS 文档中，我了解到 Redshift 不支持在内部选择中使用外部表。

我使用左连接解决并检查连接列是否为空。在我现在使用的查询下方。

INSERT INTO clusters.clusters_typologies (cluster_id, typology_id, semantic_id)
    (SELECT temp.cluster_id, temp.typology_id, temp.semantic_id
     FROM aaaa temp
            LEFT JOIN clusters.clusters_typologies clu_typ ON temp.cluster_id = clu_typ.cluster_id AND
                                                              temp.typology_id = clu_typ.typology_id AND
                                                              temp.semantic_id = clu_typ.semantic_id
     WHERE clu_typ.cluster_id IS NULL
       AND clu_typ.typology_id IS NULL
       AND clu_typ.semantic_id IS NULL);

【讨论】：

以上是关于仅当复合主键尚不存在时，如何批量插入行？ [AWS 红移]的主要内容，如果未能解决你的问题，请参考以下文章

仅当文件尚不存在时才将行附加到文件中

仅当实体尚不存在时，我是不是可以在 CloudFormation 中设置属性？

java 主键重复处理

仅当数组列表尚不存在时才将其添加到数组列表中

将表变量中的值插入到已经存在的临时表中

仅当 AWS Parameter Store 中的参数不存在时，如何创建/覆盖该参数？