Python-Sqlalchemy-Postgres：如何将子查询结果存储在变量中并将其用于主查询

Posted 2023-04-14

技术标签:

【中文标题】Python-Sqlalchemy-Postgres：如何将子查询结果存储在变量中并将其用于主查询【英文标题】：Python-Sqlalchemy-Postgres : How to store subquery result in a variable and use it to a master query 【发布时间】：2021-10-07 12:56:42 【问题描述】：

我有一个子查询，它在我的主查询中的多个 where 条件中使用。因此，子查询多次执行以获得相同的结果。有没有办法存储和使用子查询结果，使其只执行一次。

示例代码：

from sqlalchemy.sql.schema import ForeignKey
from sqlalchemy import Column, Integer, Text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql.expression import select, union

Base = declarative_base()
class Table1(Base):
    __tablename__ = 'table1'

    id = Column(Integer, primary_key=True)
    uuid = Column(Text, unique=True, nullable=False)
    
class Table2(Base):
    __tablename__ = 'table2'

    id = Column(Integer, primary_key=True)
    uuid = Column(Text, unique=True, nullable=False)
    
class Table3(Base):
    __tablename__ = 'table3'

    id = Column(Integer, primary_key=True)
    uuid = Column(Text, unique=True, nullable=False)
    

class Table4(Base):
    __tablename__ = 'table4'

    id = Column(Integer, primary_key=True)
    type = Column(Text, nullable=False)


class Table5(Base):
    __tablename__ = 'table5'

    id = Column(Integer, primary_key=True)
    res_id = Column(Integer, ForeignKey('table4.id'), nullable=False)
    value = Column(Text, nullable=False)

class Table1Map(Base):
    __tablename__ = 'table1_map'

    id = Column(Integer, ForeignKey('table4.id'), primary_key=True, nullable=False)
    map_id = Column(Integer, ForeignKey('table1.id'), primary_key=True, unique=True, nullable=False)

class Table2Map(Base):
    __tablename__ = 'table2_map'

    id = Column(Integer, ForeignKey('table4.id'), primary_key=True, nullable=False)
    map_id = Column(Integer, ForeignKey('table2.id'), primary_key=True, unique=True, nullable=False)
    
class Table3Map(Base):
    __tablename__ = 'table3_map'

    id = Column(Integer, ForeignKey('table4.id'), primary_key=True, nullable=False)
    map_id = Column(Integer, ForeignKey('table3.id'), primary_key=True, unique=True, nullable=False)


sub_query = select([Table5.__table__.c.id]).where(Table5.__table__.c.value=='somevalue')
subquery_1 = select([Table1.__table__.c.uuid.label("map_id"), Table1Map.__table__.c.id.label("id")]).select_from(Table1.__table__.join(Table1Map.__table__, Table1Map.__table__.c.map_id==Table1.__table__.c.id)).where(Table1Map.__table__.c.id.in_(sub_query))

subquery_2 = select([Table2.__table__.c.uuid.label("map_id"), Table2Map.__table__.c.id.label("id")]).select_from(Table2.__table__.join(Table2Map.__table__, Table2Map.__table__.c.map_id==Table2.__table__.c.id)).where(Table2Map.__table__.c.id.in_(sub_query))

subquery_3 = select([Table3.__table__.c.uuid.label("map_id"), Table3Map.__table__.c.id.label("id")]).select_from(Table3.__table__.join(Table3Map.__table__, Table3Map.__table__.c.map_id==Table3.__table__.c.id)).where(Table3Map.__table__.c.id.in_(sub_query))

main_query = union(subquery_1, subquery_2, subquery_3)

print(main_query)

这会产生以下查询。我需要避免这个子查询被重复执行多次。

SELECT TABLE1.UUID AS MAP_ID,
    TABLE1_MAP.ID AS ID
FROM TABLE1
JOIN TABLE1_MAP ON TABLE1_MAP.MAP_ID = TABLE1.ID
WHERE TABLE1_MAP.ID IN
        (SELECT TABLE5.ID
            FROM TABLE5
            WHERE TABLE5.VALUE = 'some_value')
UNION
SELECT TABLE2.UUID AS MAP_ID,
    TABLE2_MAP.ID AS ID
FROM TABLE2
JOIN TABLE2_MAP ON TABLE2_MAP.MAP_ID = TABLE2.ID
WHERE TABLE2_MAP.ID IN
        (SELECT TABLE5.ID
            FROM TABLE5
            WHERE TABLE5.VALUE = 'some_value')
UNION
SELECT TABLE3.UUID AS MAP_ID,
    TABLE3_MAP.ID AS ID
FROM TABLE3
JOIN TABLE3_MAP ON TABLE3_MAP.MAP_ID = TABLE3.ID
WHERE TABLE3_MAP.ID IN
        (SELECT TABLE5.ID
            FROM TABLE5
            WHERE TABLE5.VALUE = 'some_value')

【问题讨论】：

【参考方案1】：

为什么？您是否充分运行explain (analyze, buffers) 以表明它实际上导致了性能问题。重复执行很可能已经在内存中找到了必要的值，因此不需要额外的 IO。但是，在 Postgres 中完成此操作的方法是从 CTE 中的 table5 中选择值：（抱歉，我不知道您的混淆管理器 Python-Sqlalchemy）。

with cte (id) as 
     (select id 
        from table5 t5
       where t5.value = 'some_value'
     )          
select t1.uuid as map_id 
       t1m.id as id
  from table1 t1
  join table1_map on t1m.id = t1.id
  where t1m.id in
        (select  id
            from cte
        )
select t2.uuid as map_id 
       t2m.id as id
  from table2 t2
  join table2_map on t2m.id = t2.id
  where t2m.id in
        (select  id
            from cte
        )  
select t3.uuid as map_id 
       t3m.id as id
  from table3 t3
  join table3_map on t3m.id = t3.id
  where t3m.id in
        (select  id
            from cte
        );

请注意，您仍然需要重复子选择（仅引用 CTE）。如果您坚持删除任何重复项，您当然可以在子选择中执行并集，然后过滤 id。

select uuid, id
  from (select t1.uuid 
              , t1.id    
           from table1 t1 
        union 
        select t2.uuid 
             , t2.id    
          from table2 t2
        union
        select t3.uuid 
             , t3.id    
          from table3 t3 
       )  tall 
where tall.id in 
      (select t5.id
         from table5 t5
        where t5.value = 'some_value'
      );

无论哪种方式，请运行 explain 以查看在您的环境中实际表现最佳的内容。（确保它有生产量。如果你的生产有 100K 行，IE 不会对有 100 行的表运行测试）。 未测试。 注意： 列名 uuid 是个坏主意。 Postgres 支持原生数据类型uuid。名称选择不当会导致混乱（开发人员不是 Postgres），而混乱会导致错误。通常直到成为关键的生产问题才被发现。

【讨论】：

谢谢@Belayer。 CTE 有帮助，是的，我已经运行了解释分析，并且计划每次都执行子查询。与已解决的 CTE。此外，我正在测试 table5 中的 1200 万行，所有其他表都有 100 万行。我有所有这些表的索引。让我们选择

select t3.uuid as map_id  t3m.id as id from table3 t3 join table3_map on t3m.id = t3.id where t3m.id in (select  id from cte);

我有t3m.id 和t3.id 的索引。在计划中t3m.id = t3.id（索引扫描）和CTE扫描之间的in操作不是那么快。是的，它比每次执行子查询都快。但是CTE扫描和索引扫描之间的嵌套循环操作比索引扫描到索引扫描要慢。我们可以做些什么来进一步优化这个查询？

以上是关于Python-Sqlalchemy-Postgres：如何将子查询结果存储在变量中并将其用于主查询的主要内容，如果未能解决你的问题，请参考以下文章