合并重复的相同子查询

Posted

技术标签:

【中文标题】合并重复的相同子查询【英文标题】:Consolidate repeated identical subquery 【发布时间】:2018-12-28 03:54:45 【问题描述】:

如何在下面的查询中删除或合并三个相同的子查询?

用例详细信息:我使用的 Projects 表列出了数百万个项目。每个项目记录都有一个创建者、管理员和编辑者,由系统 ID 指示。我想用 Employees 表中保存的相应名称替换这些系统 ID。要交叉引用系统 ID 和名称,需要名为 Users 的第三个表及其 hr_id 字段。

Projects:
--prj-- --name-- -owner- -creator- -editor- --many more columns...
 001     alpha    001Z     300Z     304Z       ...
 002     beta     020Z     350Z     600Z       ...
 003     charlie  600Z     020Z     001Z       ...


Employees:                       Users:               
--hr_id-- --name--                 -hr_id- -sys_id-
 A01    john                      A01     001Z
 A02    susan                     A02     020Z
 A03    ryan                      A03     300Z
 A04    kelly                     A04     304Z
 A05    matt                      A05     350Z
 A06    bert                      A06     600Z

Desired output:
--prj-- --name-- -owner- -creator- -editor- --adt'l cols...
 001     alpha    john     ryan     kelly    ...
 002     beta     susan    matt     bert     ...
 003     charlie  bert     susan    john     ...

以下是我的代码,包括一些必须保留的不相关连接。我的查询按预期运行,但效率不高,我将不胜感激。另外,(根据我的谷歌搜索,我认为这是相关的)我在一个不适合 CTE 的环境中工作。

SELECT projects.prj As project_id,
       projects.name As project_name,
       owner.name As owner_name,
       creator.name As creator_name,
       editor.name As editor_name,
       stats.stat1 As stat_1,
       actuals.stat2 As stat_2
FROM "dbconnect"."projects" As projects
  LEFT JOIN (
             SELECT emps.name,
                    users.hr_id,
                    users.sys_id
             FROM "dbconnect"."employees" AS emps
             RIGHT JOIN "dbconnect"."users" AS users
               ON emps.hr_id = users.hr_id
            ) AS owner ON projects.owner = owner.sys_id
  LEFT JOIN (
             SELECT emps.name,
                    users.hr_id,
                    users.sys_id
             FROM "dbconnect"."employees" AS emps
             RIGHT JOIN "dbconnect"."users" AS users
               ON emps.hr_id = users.hr_id
            ) AS creator ON projects.creator = creator.sys_id
  LEFT JOIN (
             SELECT emps.name,
                    users.hr_id,
                    users.sys_id
             FROM "dbconnect"."employees" AS emps
             RIGHT JOIN "dbconnect"."users" AS users
               ON emps.hr_id = users.hr_id
            ) AS editor ON projects.editor = editor.sys_id
  LEFT JOIN "dbconnect"."prjstats" As stats ON projects.prj = prjstats.prj_id
  LEFT JOIN "dbconnect"."prjactuals" As actuals ON projects.prj = prjactuals.prj_id

【问题讨论】:

您是否尝试过使用STABLE 函数(输入 = 用户的 sys_id,输出 = 员工姓名)?见gpdb.docs.pivotal.io/5150/ref_guide/sql_commands/… + postgresql.org/docs/current/sql-createfunction.html + postgresql.org/docs/current/xfunc-volatility.html 【参考方案1】:

您可以为子查询创建标量值函数,然后像这样重写查询。

CREATE FUNCTION dbo.getName (@id varchar(30))  
RETURNS varchar(128) 
AS  
BEGIN  
     DECLARE @v_name varchar(128) 
     SELECT @v_name=emps.name 
     FROM "dbconnect"."employees" AS emps
     RIGHT JOIN "dbconnect"."users" AS users ON emps.hr_id = users.hr_id
     WHERE users.sys_id=@id

     RETURN @v_name
END

--Query
SELECT projects.prj As project_id,
       projects.name As project_name,
       /*
       owner.name As owner_name,
       creator.name As creator_name,
       editor.name As editor_name,
       */
       dbo.getName(projects.owner) as owner_name,
       dbo.getName(projects.creator) as creator_name,
       dbo.getName(projects.editor) as editor_name,
       stats.stat1 As stat_1,
       actuals.stat2 As stat_2
FROM "dbconnect"."projects" As projects
/*
  LEFT JOIN (
             SELECT emps.name,
                    users.hr_id,
                    users.sys_id
             FROM "dbconnect"."employees" AS emps
             RIGHT JOIN "dbconnect"."users" AS users
               ON emps.hr_id = users.hr_id
            ) AS owner ON projects.owner = owner.sys_id
  LEFT JOIN (
             SELECT emps.name,
                    users.hr_id,
                    users.sys_id
             FROM "dbconnect"."employees" AS emps
             RIGHT JOIN "dbconnect"."users" AS users
               ON emps.hr_id = users.hr_id
            ) AS creator ON projects.creator = creator.sys_id
  LEFT JOIN (
             SELECT emps.name,
                    users.hr_id,
                    users.sys_id
             FROM "dbconnect"."employees" AS emps
             RIGHT JOIN "dbconnect"."users" AS users
               ON emps.hr_id = users.hr_id
            ) AS editor ON projects.editor = editor.sys_id
*/
  LEFT JOIN "dbconnect"."prjstats" As stats ON projects.prj = prjstats.prj_id
  LEFT JOIN "dbconnect"."prjactuals" As actuals ON projects.prj = prjactuals.prj_id

或者您可以创建表值函数,然后使用 APPLY 运算符加入该函数。

CREATE FUNCTION dbo.getName (@id varchar(30))  
RETURNS TABLE
AS  
RETURN
( 
     SELECT emps.name
     FROM "dbconnect"."employees" AS emps
     RIGHT JOIN "dbconnect"."users" AS users ON emps.hr_id = users.hr_id
     WHERE users.sys_id=@id
)

--Query
SELECT projects.prj As project_id,
       projects.name As project_name,
       owner.name As owner_name,
       creator.name As creator_name,
       editor.name As editor_name,
       stats.stat1 As stat_1,
       actuals.stat2 As stat_2
FROM "dbconnect"."projects" As projects
OUTER APPLY dbo.getName(projects.owner) as owner
OUTER APPLY dbo.getName(projects.creator) as creator
OUTER APPLY dbo.getName(projects.editor) as editor
/*
  LEFT JOIN (
             SELECT emps.name,
                    users.hr_id,
                    users.sys_id
             FROM "dbconnect"."employees" AS emps
             RIGHT JOIN "dbconnect"."users" AS users
               ON emps.hr_id = users.hr_id
            ) AS owner ON projects.owner = owner.sys_id
  LEFT JOIN (
             SELECT emps.name,
                    users.hr_id,
                    users.sys_id
             FROM "dbconnect"."employees" AS emps
             RIGHT JOIN "dbconnect"."users" AS users
               ON emps.hr_id = users.hr_id
            ) AS creator ON projects.creator = creator.sys_id
  LEFT JOIN (
             SELECT emps.name,
                    users.hr_id,
                    users.sys_id
             FROM "dbconnect"."employees" AS emps
             RIGHT JOIN "dbconnect"."users" AS users
               ON emps.hr_id = users.hr_id
            ) AS editor ON projects.editor = editor.sys_id
*/
  LEFT JOIN "dbconnect"."prjstats" As stats ON projects.prj = prjstats.prj_id
  LEFT JOIN "dbconnect"."prjactuals" As actuals ON projects.prj = prjactuals.prj_id

【讨论】:

【参考方案2】:

只需使用 CTE。我更喜欢left joins,所以我会这样写:

WITH eu as (
      SELECT e.name, u.hr_id, u.sys_id
      FROM "dbconnect"."users" u LEFT JOIN
           "dbconnect"."employees" e
           ON e.hr_id = u.hr_id
     )       
SELECT p.prj As project_id, p.name As project_name,
       euo.name As owner_name, euc.name As creator_name,
       eue.name As editor_name,
       ps.stat1 As stat_1,
       pa.stat2 As stat_2
FROM "dbconnect"."projects" p LEFT JOIN
     eu euo 
     ON p.owner = euo.sys_id LEFT JOIN
     eu euc
     ON p.creator = euc.sys_id LEFT JOIN
     eu eue
     ON p.editor = eue.sys_id LEFT JOIN
     "dbconnect"."prjstats" ps
     ON p.prj = ps.prj_id LEFT JOIN
     "dbconnect"."prjactuals" pa
     ON p.prj = pa.prj_id;

【讨论】:

以上是关于合并重复的相同子查询的主要内容,如果未能解决你的问题,请参考以下文章

mysql如何合并查询多个相同数据结构库的表输出来结果?

Oracle SQL:对 CASE WHEN 重复使用子查询,而无需重复子查询

带有子查询的Postgresql更新[重复]

hive UNION和子查询

MySQL 索引优化与子查询与左连接

SQL Join与子查询计算不同表中具有相同ID的记录数