Python中的子查询
Posted
技术标签:
【中文标题】Python中的子查询【英文标题】:Subqueries in Python 【发布时间】:2019-10-08 09:41:10 【问题描述】:我正在尝试使用子查询对多个表运行匹配并将不匹配的记录移动到新表。
我已经编写了 SQL 子查询,但我面临的唯一问题是性能,它需要大量时间来处理。
create table UnmatchedRecord
(select a.*
from HashedValues a
where a.Address_Hash not in(select b.Address_Hash
from HashAddress b)
and a.Person_Hash not in(select d.Person_Hash
from HashPerson d)
and a.HH_Hash not in(select f.HH_Hash
from HashHH f)
and a.VehicleRegistration not in(select VehicleRegistration
from MasterReference)
and a.EmailAddress not in (select EmailAddress
from MasterReference)
and a.PhoneNumber not in (select PhoneNumber
from MasterReference)
and a.NationalInsuranceNo not in (select NationalInsuranceNo
from MasterReference))
【问题讨论】:
格式正确的 SQL 更容易阅读。 (并写。) 感谢您编辑代码 【参考方案1】:您至少可以用一个替换四个子查询:
select HashedValues.*
from HashedValues
where not exists (
select *
from MasterReference
where HashedValues.VehicleRegistration = MasterReference.VehicleRegistration
or HashedValues.EmailAddress = MasterReference.EmailAddress
or HashedValues.PhoneNumber = MasterReference.PhoneNumber
or HashedValues.NationalInsuranceNo = MasterReference.NationalInsuranceNo
)
and not exists (
select *
from HashAddress
where HashedValues.Address_Hash = HashAddress.Address_Hash
)
and not exists (
select *
from HashPerson
where HashedValues.Person_Hash = HashPerson.Person_Hash
)
and not exists (
select *
from HashHH
where HashedValues.HH_Hash = HashHH.HH_Hash
)
【讨论】:
它会完全提高性能吗?我在 Databricks 中运行代码 只有一种方法可以找出答案。以上是关于Python中的子查询的主要内容,如果未能解决你的问题,请参考以下文章