Postgres 更新极慢

Posted 2023-04-14

技术标签:

【中文标题】Postgres 更新极慢【英文标题】：Postgres extremely slow update 【发布时间】：2020-12-04 17:06:48 【问题描述】：

我正在通过我的 Postgres 数据库（PostgreSQL 10.9，由 Visual C++ build 1800、64 位、Windows 10 编译）上的许多奇怪的慢查询来工作。这里只是一个例子：一个非常简单的更新，需要花费大量时间来执行。

UPDATE
    "Prescription"
SET
    "DiscountList" = TRUE
WHERE
    "PharmacyId" = '1ec0cec5-1cbc-412f-9765-ac0f010de111'
    AND "DiscountList" = FALSE
    AND ("Id" IN (
            SELECT
                discount1_."PrescriptionId"
            FROM
                "Discount" discount1_
            WHERE
                discount1_."PharmacyId" = '1ec0cec5-1cbc-412f-9765-ac0f010de111'
                AND discount1_."DiscountBy" = 'Prescription'));

这个查询需要将近 52 秒才能完成（即使没有要更新的内容）！你可以看到EXPLAIN(ANALYZE, BUFFERS) output。我曾尝试将子选择转换为连接，但它变得更糟（178 秒）。显然，这条线在分析中很突出：Index Scan using "Prescription_pkey" ... (actual time=0.018..0.018 rows=0 loops=2503751)。那是45秒。为什么需要这么长时间？有什么改进的建议吗？

这里是表/索引定义：

CREATE TABLE public."Prescription"
(
    "Id" uuid NOT NULL,
    "RecordVersion" integer NOT NULL DEFAULT 1,
    "RecordCreatedAt" timestamp without time zone,
    "RecordModifiedAt" timestamp without time zone,
    "CriterionType" integer NOT NULL DEFAULT 0,
    "DateCancelled" timestamp without time zone,
    "DateDispensed" timestamp without time zone,
    "DateSold" timestamp without time zone,
    "DiscountList" boolean NOT NULL DEFAULT false,
    "ExcludedReason" integer NOT NULL DEFAULT 0,
    "GroupNumber" character varying(255) COLLATE pg_catalog."default",
    "Insurer" character varying(255) COLLATE pg_catalog."default",
    "NDC" character varying(11) COLLATE pg_catalog."default",
    "PCN" character varying(255) COLLATE pg_catalog."default",
    "ProductName" character varying(255) COLLATE pg_catalog."default",
    "Quantity" numeric(19,4),
    "RxNumber" character varying(255) COLLATE pg_catalog."default",
    "ThirdPartyPaid" numeric(19,4),
    "TotalClaimPrice" numeric(19,4),
    "UPC" character varying(14) COLLATE pg_catalog."default",
    "PharmacyId" uuid,
    CONSTRAINT "Prescription_pkey" PRIMARY KEY ("Id"),
    CONSTRAINT "Prescription_PharmacyId_fkey" FOREIGN KEY ("PharmacyId")
        REFERENCES public."Pharmacy" ("Id") MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
        DEFERRABLE
);

CREATE INDEX "INDEX_Prescription_PharmacyIde60cffc4508643c09b6263ec4bdf0987"
    ON public."Prescription" USING btree
    ("PharmacyId")
    TABLESPACE pg_default;

CREATE TABLE public."Discount"
(
    "Id" uuid NOT NULL,
    "RecordVersion" integer NOT NULL DEFAULT 1,
    "RecordCreatedAt" timestamp without time zone,
    "RecordModifiedAt" timestamp without time zone,
    "DiscountBy" character varying(255) COLLATE pg_catalog."default",
    "DiscountedPrice" numeric(19,4),
    "DiscountReason" integer,
    "PharmacyId" uuid NOT NULL,
    "PrescriptionId" uuid,
    CONSTRAINT "Discount_pkey" PRIMARY KEY ("Id"),
    CONSTRAINT "Discount_PharmacyId_fkey" FOREIGN KEY ("PharmacyId")
        REFERENCES public."Pharmacy" ("Id") MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
        DEFERRABLE
    CONSTRAINT "Discount_PrescriptionId_fkey" FOREIGN KEY ("PrescriptionId")
        REFERENCES public."Prescription" ("Id") MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
        DEFERRABLE
);

CREATE INDEX "INDEX_Discount_PharmacyIdc94000327c3b434caa4c2807d67e66a0"
    ON public."Discount" USING btree
    ("PharmacyId")
    TABLESPACE pg_default;
CREATE INDEX "INDEX_Discount_PrescriptionId6cd5c38e038c47c2b3f15e9e8ae59dc7"
    ON public."Discount" USING btree
    ("PrescriptionId")
    TABLESPACE pg_default;

以下是相关的配置设置：

shared_buffers = 1GB
temp_buffers = 64MB
work_mem = 128MB
maintenance_work_mem = 1GB
seq_page_cost = 1.0
random_page_cost = 1.0
effective_cache_size = 32GB

最近对所有表进行了清理和分析。

【问题讨论】：

sort+unique从何而来？（NULLs？）无论如何：使用EXISTS()而不是IN()并将PharmacyId添加到JOIN条件中，而不是重复它。 ERROR: column discount1_.PrescriptionId does not exist :: 请发布正确的 DDL。与您的问题无关，但是：您应该真正避免使用那些可怕的带引号的标识符。他们的麻烦比他们值得的要多得多。 wiki.postgresql.org/wiki/… 抱歉，为了简洁起见，我删除了一堆列，并且不小心从 Discount 表中删除了 PrescriptionId。此问题已修复。您能向我们展示您使用 JOIN 的解决方案吗？ 【参考方案1】：

第一步：用EXISTS()代替IN()，不要重复字面条件：

UPDATE "Prescription" dst
SET "DiscountList" = TRUE
WHERE dst."DiscountList" = FALSE
AND "PharmacyId" = '1ec0cec5-1cbc-412f-9765-ac0f010de111'
AND EXISTS (
        SELECT *
        FROM "Discount" ex
        WHERE ex."PrescriptionId" = dst."Id"
        AND ex."PharmacyId" = dst."PharmacyId"
        AND ex."DiscountBy" = 'Prescription'
        );

【讨论】：

谢谢，这让它稍微好一点（43 秒）。 EXPLAIN output。不过，我仍然偏离目标。这不就是这样做的吗？在公共上创建索引“INDEX_Prescription_PharmacyIde60cffc4508643c09b6263ec4bdf0987”。“处方”使用 btree (“PharmacyId”) TABLESPACE pg_default; 有多少个处方有 ` "PharmacyId" = '1ec0cec5-1cbc-412f-9765-ac0f010de111' AND "DiscountList" = FALSE`？ PostgreSQL 认为有多少？总共 85,908,800 行。 “PharmacyId” = '1ec0cec5-1cbc-412f-9765-ac0f010de111' AND “DiscountList” = FALSE 的 8,461,801 行 (9.85%)。我不确定 PostgreSQL 认为有多少行。根据查询计划，它看起来可能在 2,500,000 行左右。 pg_stat_user_tables.n_live_tup 显示 86,453,298（总计）。

以上是关于Postgres 更新极慢的主要内容，如果未能解决你的问题，请参考以下文章