Python-Sqlalchemy-Postgres:如何将子查询结果存储在变量中并将其用于主查询
Posted
技术标签:
【中文标题】Python-Sqlalchemy-Postgres:如何将子查询结果存储在变量中并将其用于主查询【英文标题】:Python-Sqlalchemy-Postgres : How to store subquery result in a variable and use it to a master query 【发布时间】:2021-10-07 12:56:42 【问题描述】:我有一个子查询,它在我的主查询中的多个 where 条件中使用。因此,子查询多次执行以获得相同的结果。有没有办法存储和使用子查询结果,使其只执行一次。
示例代码:
from sqlalchemy.sql.schema import ForeignKey
from sqlalchemy import Column, Integer, Text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql.expression import select, union
Base = declarative_base()
class Table1(Base):
__tablename__ = 'table1'
id = Column(Integer, primary_key=True)
uuid = Column(Text, unique=True, nullable=False)
class Table2(Base):
__tablename__ = 'table2'
id = Column(Integer, primary_key=True)
uuid = Column(Text, unique=True, nullable=False)
class Table3(Base):
__tablename__ = 'table3'
id = Column(Integer, primary_key=True)
uuid = Column(Text, unique=True, nullable=False)
class Table4(Base):
__tablename__ = 'table4'
id = Column(Integer, primary_key=True)
type = Column(Text, nullable=False)
class Table5(Base):
__tablename__ = 'table5'
id = Column(Integer, primary_key=True)
res_id = Column(Integer, ForeignKey('table4.id'), nullable=False)
value = Column(Text, nullable=False)
class Table1Map(Base):
__tablename__ = 'table1_map'
id = Column(Integer, ForeignKey('table4.id'), primary_key=True, nullable=False)
map_id = Column(Integer, ForeignKey('table1.id'), primary_key=True, unique=True, nullable=False)
class Table2Map(Base):
__tablename__ = 'table2_map'
id = Column(Integer, ForeignKey('table4.id'), primary_key=True, nullable=False)
map_id = Column(Integer, ForeignKey('table2.id'), primary_key=True, unique=True, nullable=False)
class Table3Map(Base):
__tablename__ = 'table3_map'
id = Column(Integer, ForeignKey('table4.id'), primary_key=True, nullable=False)
map_id = Column(Integer, ForeignKey('table3.id'), primary_key=True, unique=True, nullable=False)
sub_query = select([Table5.__table__.c.id]).where(Table5.__table__.c.value=='somevalue')
subquery_1 = select([Table1.__table__.c.uuid.label("map_id"), Table1Map.__table__.c.id.label("id")]).select_from(Table1.__table__.join(Table1Map.__table__, Table1Map.__table__.c.map_id==Table1.__table__.c.id)).where(Table1Map.__table__.c.id.in_(sub_query))
subquery_2 = select([Table2.__table__.c.uuid.label("map_id"), Table2Map.__table__.c.id.label("id")]).select_from(Table2.__table__.join(Table2Map.__table__, Table2Map.__table__.c.map_id==Table2.__table__.c.id)).where(Table2Map.__table__.c.id.in_(sub_query))
subquery_3 = select([Table3.__table__.c.uuid.label("map_id"), Table3Map.__table__.c.id.label("id")]).select_from(Table3.__table__.join(Table3Map.__table__, Table3Map.__table__.c.map_id==Table3.__table__.c.id)).where(Table3Map.__table__.c.id.in_(sub_query))
main_query = union(subquery_1, subquery_2, subquery_3)
print(main_query)
这会产生以下查询。我需要避免这个子查询被重复执行多次。
SELECT TABLE1.UUID AS MAP_ID,
TABLE1_MAP.ID AS ID
FROM TABLE1
JOIN TABLE1_MAP ON TABLE1_MAP.MAP_ID = TABLE1.ID
WHERE TABLE1_MAP.ID IN
(SELECT TABLE5.ID
FROM TABLE5
WHERE TABLE5.VALUE = 'some_value')
UNION
SELECT TABLE2.UUID AS MAP_ID,
TABLE2_MAP.ID AS ID
FROM TABLE2
JOIN TABLE2_MAP ON TABLE2_MAP.MAP_ID = TABLE2.ID
WHERE TABLE2_MAP.ID IN
(SELECT TABLE5.ID
FROM TABLE5
WHERE TABLE5.VALUE = 'some_value')
UNION
SELECT TABLE3.UUID AS MAP_ID,
TABLE3_MAP.ID AS ID
FROM TABLE3
JOIN TABLE3_MAP ON TABLE3_MAP.MAP_ID = TABLE3.ID
WHERE TABLE3_MAP.ID IN
(SELECT TABLE5.ID
FROM TABLE5
WHERE TABLE5.VALUE = 'some_value')
【问题讨论】:
【参考方案1】:为什么?您是否充分运行explain (analyze, buffers)
以表明它实际上导致了性能问题。重复执行很可能已经在内存中找到了必要的值,因此不需要额外的 IO。但是,在 Postgres 中完成此操作的方法是从 CTE 中的 table5 中选择值:(抱歉,我不知道您的混淆管理器 Python-Sqlalchemy)。
with cte (id) as
(select id
from table5 t5
where t5.value = 'some_value'
)
select t1.uuid as map_id
t1m.id as id
from table1 t1
join table1_map on t1m.id = t1.id
where t1m.id in
(select id
from cte
)
select t2.uuid as map_id
t2m.id as id
from table2 t2
join table2_map on t2m.id = t2.id
where t2m.id in
(select id
from cte
)
select t3.uuid as map_id
t3m.id as id
from table3 t3
join table3_map on t3m.id = t3.id
where t3m.id in
(select id
from cte
);
请注意,您仍然需要重复子选择(仅引用 CTE)。如果您坚持删除任何重复项,您当然可以在子选择中执行并集,然后过滤 id。
select uuid, id
from (select t1.uuid
, t1.id
from table1 t1
union
select t2.uuid
, t2.id
from table2 t2
union
select t3.uuid
, t3.id
from table3 t3
) tall
where tall.id in
(select t5.id
from table5 t5
where t5.value = 'some_value'
);
无论哪种方式,请运行 explain 以查看在您的环境中实际表现最佳的内容。 (确保它有生产量。如果你的生产有 100K 行,IE 不会对有 100 行的表运行测试)。 未测试。 注意: 列名 uuid
是个坏主意。 Postgres 支持原生数据类型uuid
。名称选择不当会导致混乱(开发人员不是 Postgres),而混乱会导致错误。通常直到成为关键的生产问题才被发现。
【讨论】:
谢谢@Belayer。 CTE 有帮助,是的,我已经运行了解释分析,并且计划每次都执行子查询。与已解决的 CTE。此外,我正在测试 table5 中的 1200 万行,所有其他表都有 100 万行。我有所有这些表的索引。 让我们选择select t3.uuid as map_id t3m.id as id from table3 t3 join table3_map on t3m.id = t3.id where t3m.id in (select id from cte);
我有t3m.id
和t3.id
的索引。在计划中t3m.id = t3.id
(索引扫描)和CTE扫描之间的in
操作不是那么快。是的,它比每次执行子查询都快。但是CTE扫描和索引扫描之间的嵌套循环操作比索引扫描到索引扫描要慢。我们可以做些什么来进一步优化这个查询?以上是关于Python-Sqlalchemy-Postgres:如何将子查询结果存储在变量中并将其用于主查询的主要内容,如果未能解决你的问题,请参考以下文章