pgplsql 存储过程的性能？

Posted 2023-03-31

技术标签:

【中文标题】pgplsql 存储过程的性能？【英文标题】：performance of a pgplsql stored procedure? 【发布时间】：2012-08-01 12:50:45 【问题描述】：

我们有以下存储过程，它最近在 postgres db 中的大量日期上执行得非常慢。问题：

我们在本质上解析一个字符串（第一个数字是行的id，第二个是状态）

||2|0||3|1||4|0||

用像java这样的高级语言解析、拆分字符串和循环会更好吗？循环在 Postgres 中可以更有效吗？存储过程中的事务是如何处理的？整个功能是一笔交易？可能我们在数据库上做了很多写入和删除操作。删除也需要很长时间。可以更高效地处理吗？

CREATE OR REPLACE FUNCTION verificaemitidos(entrada text, largo_mensaje integer)
  RETURNS character AS
$BODY$
    DECLARE                 
    texto_procesado text;       
    identificador bigint;
    estado_mensaje int;
    i int;
    existe_documento int;
    estado_documento text;
    rut numeric;
    tipo int;
    folio_doc numeric;
    otros_estados int;
    BEGIN
        --estado 1 insertado
        --estado 0 no insertado
        --mensaje id_documento|estado||id_documento|estado||


        i := 1;
        while (i <= largo_mensaje)
        loop            
            --Proceso el mensaje
            texto_procesado := split_part(entrada,'||', i) ;
            identificador := split_part(texto_procesado, '|', 1);   
            estado_mensaje := split_part(texto_procesado, '|', 2);              
            -- Se comienza a hacer la comparacion
            existe_documento := (select count (id) from uris_emitidos where id = identificador);
            select estado, emp_rut, tipo_doc, folio into estado_documento, rut, tipo, folio_doc from uris_emitidos where id = identificador;

            --si existe el documento            
            if (existe_documento > 0) then              
                --si el documento que se ingreso esta insertado
                if (estado_mensaje = 1) then
                    --si esta aceptado se eliminan todos los documentos con ese rut, tipo, folio
                    if (estado_documento = 'A') then
                        delete from uris_emitidos where folio = folio_doc and emp_rut = rut and tipo_doc = tipo;
                    end if;
                    --si esta aceptado con reparo se eliminan todos los documentos con ese rut, tipo, folio
                    if (estado_documento = 'B') then
                        delete from uris_emitidos where folio = folio_doc and emp_rut = rut and tipo_doc = tipo;
                    end if;
                    --si esta rechazado se elimina el rechazado y el publicado
                    if (estado_documento = 'R') then
                        delete from uris_emitidos where folio = folio_doc and emp_rut = rut and tipo_doc = tipo and estado in ('R', 'P');
                    end if;
                    --si esta publicado se elimina
                    if (estado_documento = 'P') then
                        delete from uris_emitidos where id = identificador;
                    end if;
                --si el documento que se ingreso no esta insertado              
                else
                    --si esta aceptado se actualiza para que el proceso lo re-encole
                    if (estado_documento = 'A') then 
                        update uris_emitidos set estado_envio = 0, cont = (cont + 1) where id = identificador;                      
                    end if;
                    --si esta aceptado con reparo se actualiza para que el proceso lo re-encole
                    if (estado_documento = 'B') then
                        update uris_emitidos set estado_envio = 0, cont = (cont + 1) where id = identificador;                      
                    end if;
                    --si esta rechazado se verifica que no existe un registro aceptado que se haya encolado o este en espera de encolar
                    if (estado_documento = 'R') then
                        otros_estados = (select count(id) from uris_emitidos ue where ue.folio = folio_doc and ue.emp_rut = rut and ue.tipo_doc = tipo and ue.estado in ('A', 'B'));
                        --si otros estados = 0 significa que el estado rechazado es el mejor estado que hay, por lo tanto se debe re-encolar
                        if (otros_estados = 0) then
                            update uris_emitidos set estado_envio = 0, cont = (cont + 1) where id = identificador;
                        end if;
                    end if;
                    --si esta rechazado se verifica que no existe un registro aceptado o rechazado que se haya encolado o este en espera de encolar
                    if (estado_documento = 'P') then
                        otros_estados = (select count(id) from uris_emitidos where folio = folio_doc and emp_rut = rut and tipo_doc = tipo and estado in ('A', 'B', 'R'));
                        --si otros estados = 0 significa que el estado rechazado es el mejor estado que hay, por lo tanto se debe re-encolar
                        if (otros_estados = 0) then
                            update uris_emitidos set estado_envio = 0, cont = (cont + 1) where id = identificador;
                        end if;
                    end if;

                end if;

            end if;

            i := i+1;
        end loop;
        return 'ok';


    END;
$BODY$
  LANGUAGE plpgsql VOLATILE;

【问题讨论】：

这是“一次一行”处理。它应该是缓慢的。 “焦点”随着循环的每次迭代而变化。使用 sql 表并在选择上运行光标而不是解析字符串会更快吗？如何提高接受率？将数组作为参数传递而不是解析字符串会更快吗？您的删除语句中是否使用了任何索引？您可能只想在其中放一些老式的调试打印以及一些时间信息，以查看瓶颈在哪里。 【参考方案1】：

在 pgsql 中循环可以更高效吗？

正如@wildplasser 所提到的，运行 SQL 语句来操纵行集通常比单独操纵每一行要快得多。循环只能在 plpgsql（或其他过程语言函数，或者以有限的方式，在递归 CTE 中）中，而不是在普通 SQL 中。他们做得很好，但不是 PostgreSQL 的强项。

如何在存储过程中处理事务？整个函数是一笔交易？

是的，整个函数作为一个事务运行。它可以是更大交易的一部分，但不能拆分。

在related answer on dba.SE 中了解有关 plpgsql 函数如何工作的更多信息。

用java这样的高级语言解析、拆分字符串和循环会更好吗？

如果字符串不是很大（几千个元素），那真的没关系，只要你的逻辑是合理的。让你慢下来的不是字符串解析。这是对表中行的“一次一行”操作。

更快的替代方法是在一个或几个 SQL 语句中完成所有操作。我会为此使用data modifying CTEs（在 PostgreSQL 9.1 中引入）：解析字符串一次，然后在这个内部工作表上运行 DML 语句。

考虑以下演示（未经测试）：

WITH a(t) AS (  -- split string into rows
    SELECT unnest(string_to_array(trim('||2|0||3|1||4|0||'::text, '|'), '||'))
    )
    , b AS (    -- split record into columns per row
    SELECT split_part(t, '|', 1) AS identificador 
          ,split_part(t, '|', 2) AS estado_mensaje 
    FROM   a
    )
    , c AS (    -- implements complete IF branch of your base loop
    DELETE FROM uris_emitidos u
    USING  b
    WHERE  u.id = b.identificador
    AND    u.estado IN ('A','B','R','P')
    AND    b.estado_mensaje = 1
    )
--  , d AS (    -- implements ELSE branch of your loop
--  DELETE ...
--  )
SELECT 'ok':

除了主要的设计缺陷之外，循环中的逻辑是多余且不一致的。我将整个 IF 分支合并到上面的第一个 DELETE 语句中。

有关手册here中使用的函数的更多信息。

【讨论】：

DELETE FROM uris_emitidos u FROM b 你可能在这里需要一个 USING ，而不是第二个 FROM ？顺便说一句：CTE 不错！ @wildplasser：当然，已修复 - SQL-Server 语法的反洗多么令人沮丧。我知道你会喜欢 CTE 解决方案。 :D 您好，非常感谢您提供的好解决方案。不幸的是，有问题的数据库目前正在使用 Postgres 9.0.4。所以我认为CTE功能不存在。你对那个版本有什么建议？ @user1122176：PostgreSQL 9.0 确实有 CTE，但没有 data-modifying，这是 9.1 中的新功能。您可以使用临时表而不是 CTE 来帮助自己。升级到 9.1 通常是个好主意。它应该很简单 - 取决于您的分布和一般设置。请先阅读release notes for 9.1。【参考方案2】：

当您查看它时，可怕函数的参数（largo_mensaje 除外）可以被视为工作列表的字段：

CREATE TABLE worklist
    ( texto_procesado text
    , identificador bigint
    , estado_mensaje int
    );

相应的工作可以像这样进行（我从 Erwin 的回答中借用了这个）：

DELETE FROM uris_emitidos u
 USING  worklist wl
 WHERE  u.id = wl.identificador
   AND    u.estado IN ('A','B','R','P')
   AND    wl.estado_mensaje = 1
   AND    wl.texto_procesado IN ('action1' , 'action2', ...)
    ;

，然后必须清理工作清单（ AND NOT EXISTS (SELECT * FROM uris_emiidos WHERE ))；

【讨论】：

那将是在 postgres 9.0 中执行此操作的方法吗？您能否详细说明填写工作清单和清除工作清单的方法？非常感谢！应该可以在任何体面的版本中使用。我不会为你做你的工作（它太大了，而且我对你的业务知之甚少），它只是为了展示一种非程序性、非一次一行的方式来查看问题。填写工作清单将与原始功能的开始相同：拆分字段，并将它们放在一个表格中。 @user1122176：基本上，我的示例中的第一个WITH 子句就是这样做的：创建一个工作表。 CTE 会自动处理。但是您可以将结果显式地写入（临时）表并在以下 DML 语句中使用它（也适用于 Postgres 9.0）。顺便说一句，表格中不需要texto_procesado，这只是代表整行的中间状态。【参考方案3】：

我现在想出的解决方案如下。绝对比循环更快，并减少数据库上的写入和读取次数。谢谢

CREATE OR REPLACE FUNCTION verificaemitidos2(entrada text, largo_mensaje integer)
  RETURNS character AS
$BODY$
    DECLARE
    texto_procesado text;
    identificador bigint;
    estado_mensaje int;
    i int;
    existe_documento int;
    estado_documento text;
    rut numeric;
    tipo int;
    folio_doc numeric;
    otros_estados int;
    BEGIN
        --estado 1 insertado
        --estado 0 no insertado
        --mensaje id_documento|estado||id_documento|estado||

        --DROP TABLE worklist;
        CREATE TEMP TABLE worklist
            ( identificador bigint,
              estado_mensaje int,
              rut_emisor numeric,
              tipo_doc numeric,
              folio numeric,
              estado text
            );

        INSERT INTO worklist (identificador, estado_mensaje, rut_emisor, tipo_doc, folio, estado)
            SELECT split_part(t, '|', 1)::bigint ,
              split_part(t, '|', 2)::integer ,
              uri.emp_rut,
              uri.tipo_doc,
              uri.folio,
              uri.estado
              from (SELECT unnest(string_to_array(trim(entrada::text, '|'), '||'))) as a(t),
              uris_emitidos uri
              WHERE uri.id = split_part(t, '|', 1)::bigint;

        -- ESTADO 1
        -- ACEPTADOS PRIMEROS DOS CASOS

        DELETE FROM uris_emitidos u
         USING  worklist wl
         WHERE  wl.estado_mensaje = 1
           AND  wl.estado IN ('A', 'B')
           AND  u.folio = wl.folio
           AND  u.emp_rut = wl.rut_emisor
           AND  u.tipo_doc =  wl.tipo_doc;

        -- ESTADO 1
        -- CASO 3

        --delete from uris_emitidos where folio = folio_doc and emp_rut = rut and tipo_doc = tipo and estado in ('R', 'P');

        DELETE FROM uris_emitidos u
         USING  worklist wl
         WHERE  wl.estado_mensaje = 1
           AND  wl.estado IN ('R')
           AND  u.estado IN ('R', 'P')
           AND  u.folio = wl.folio
           AND  u.emp_rut = wl.rut_emisor
           AND  u.tipo_doc =  wl.tipo_doc;

        -- ESTADO 1
        -- CASO 4

         DELETE FROM uris_emitidos u
         USING  worklist wl
         WHERE  u.id = wl.identificador
           AND  wl.estado_mensaje = 1
           AND  wl.estado = 'P';

        -- ESTADO 0
        -- CASOS 1+2

        UPDATE uris_emitidos u
        SET estado_envio = 0, cont =  (u.cont + 1)
        FROM worklist wl
        WHERE  u.id = wl.identificador
        AND  wl.estado_mensaje = 0
        AND  wl.estado IN ('A' , 'B');

         -- update uris_emitidos set estado_envio = 0, cont = (cont + 1) where id = identificador;

        -- ESTADO 0
        -- CASO 3

        UPDATE uris_emitidos u
        SET estado_envio = 0, cont =  (u.cont + 1)
        FROM worklist wl
        WHERE  u.id = wl.identificador
        AND  wl.estado_mensaje = 0
        AND  wl.estado IN ('R')
        AND NOT EXISTS (
        SELECT 1 FROM uris_emitidos ue
        WHERE ue.folio = wl.folio
        AND ue.emp_rut = wl.rut_emisor
        AND ue.tipo_doc = wl.tipo_doc
        AND ue.estado IN ('A', 'B'));

        -- ESTADO 0
        -- CASO 4

        UPDATE uris_emitidos u
        SET estado_envio = 0, cont =  (u.cont + 1)
        FROM worklist wl
        WHERE  u.id = wl.identificador
        AND  wl.estado_mensaje = 0
        AND  wl.estado IN ('P')
        AND NOT EXISTS (
        SELECT 1 FROM uris_emitidos ue
        WHERE ue.folio = wl.folio
        AND ue.emp_rut = wl.rut_emisor
        AND ue.tipo_doc = wl.tipo_doc
        AND ue.estado IN ('A', 'B', 'R'));

        DROP TABLE worklist;

        RETURN 'ok';
    END;
$BODY$
  LANGUAGE plpgsql VOLATILE;

【讨论】：

我在另一个答案中添加了simplified version。【参考方案4】：

您的posted answer 可以改进和简化：

CREATE OR REPLACE FUNCTION x.verificaemitidos3(_entrada text)
  RETURNS text AS
$BODY$
BEGIN
   --estado 1 insertado
   --estado 0 no insertado

   CREATE TEMP TABLE worklist ON COMMIT DROP AS
   SELECT split_part(t, '|', 1)::bigint AS identificador
         ,split_part(t, '|', 2)::integer AS estado_mensaje
         ,uri.emp_rut AS rut_emisor
         ,uri.tipo_doc
         ,uri.folio
         ,uri.estado
   FROM  (SELECT unnest(string_to_array(trim(_entrada::text, '|'), '||'))) a(t)
   JOIN   uris_emitidos uri ON uri.id = split_part(t, '|', 1)::bigint;

   -- ESTADO 1

   DELETE FROM uris_emitidos u
   USING  worklist w
   WHERE  w.estado_mensaje = 1
   AND   (
         (w.estado IN ('A', 'B')   -- CASOS 1+2
      OR  w.estado =   'R'         -- CASO 3
      AND u.estado IN ('R', 'P')
      )
      AND u.folio = w.folio
      AND u.emp_rut = w.rut_emisor
      AND u.tipo_doc =  w.tipo_doc

      OR (w.estado = 'P'           -- CASO 4
      AND w.identificador = u.id
      )
      );

   -- ESTADO 0

   UPDATE uris_emitidos u
   SET    estado_envio = 0
         ,cont = cont + 1
   FROM   worklist w
   WHERE  w.estado_mensaje = 0
   AND    w.identificador = u.id
   AND   (w.estado IN ('A', 'B')   -- CASOS 1+2

      OR  w.estado = 'R'           -- CASO 3
      AND NOT EXISTS (
         SELECT 1
         FROM   uris_emitidos ue
         WHERE  ue.folio = w.folio
         AND    ue.emp_rut = w.rut_emisor
         AND    ue.tipo_doc = w.tipo_doc
         AND    ue.estado IN ('A', 'B')
         )

      OR  w.estado = 'P'         -- CASO 4
      AND NOT EXISTS (
         SELECT 1
         FROM   uris_emitidos ue
         WHERE  ue.folio = w.folio
         AND    ue.emp_rut = w.rut_emisor
         AND    ue.tipo_doc = w.tipo_doc
         AND    ue.estado IN ('A', 'B', 'R')
         )
      );

   RETURN 'ok';
END;
$BODY$  LANGUAGE plpgsql VOLATILE;

要点

一半的时间和两倍的速度。删除第二个函数参数。你不再需要它了。删除所有变量。它们都不再使用了。使用ON COMMIT DROP 以避免在一个会话中重复调用时与现有临时表发生冲突。在 Postgres 9.1 中，您可以使用 CREATE TABLE IF EXISTS 直接根据SELECT 和CREATE TABLE AS 的结果创建临时表。合并所有DELETEs。合并所有UPDATEs。修剪一些噪音。使用返回类型text 而不是character。

【讨论】：

以上是关于pgplsql 存储过程的性能？的主要内容，如果未能解决你的问题，请参考以下文章