WITH 语句
CTEs(Common Table Expressions),也就是通用表表达式,你有可能称做它为WITH 语句。和数据库中视图一样,它的主要好处就是,它允许你在当前事务中创建临时表。你可以大量使用它,因为它允许你思路清晰的构建模块,别人很容易就理解你在做什么。
WITH语句作为一个辅助语句依附于主语句,WITH语句和主语句都可以是SELECT,INSERT,UPDATE,DELETE中的任何一种语句。
CTEs的优势在可读性上,其性能通常不如经过精简优化过的SQL语句性能高。大多数差距小于一倍差距。
让我们select举个简单的例子
WITH users_tasks AS (
SELECT
users.email,
array_agg(tasks.name) as task_list,
projects.title
FROM
users,
tasks,
project
WHERE
users.id = tasks.user_id
projects.title = tasks.project_id
GROUP BY
users.email,
projects.title
)
16
1
WITH users_tasks AS (
2
SELECT
3
users.email,
4
array_agg(tasks.name) as task_list,
5
projects.title
6
FROM
7
users,
8
tasks,
9
project
10
WHERE
11
users.id = tasks.user_id
12
projects.title = tasks.project_id
13
GROUP BY
14
users.email,
15
projects.title
16
)
通过这样定义临时表users_tasks,我就可以在后面加上对users_tasks基本查询语句,像:
SELECT *
FROM users_tasks;
2
1
SELECT *
2
FROM users_tasks;
有趣的是你可以将它们连在一起。当我知道分配给每个用户的任务量时,也许我想知道在一个指定的任务上,谁因为对这个任务负责超过了50%而因此造成瓶颈。为了简化,我们可以使用多种方式,先计算每个任务的总量,然后是每人针对每个任务的负责总量。
total_tasks_per_project AS (
SELECT
project_id,
count(*) as task_count
FROM tasks
GROUP BY project_id
),
tasks_per_project_per_user AS (
SELECT
user_id,
project_id,
count(*) as task_count
FROM tasks
GROUP BY user_id, project_id
),
16
1
total_tasks_per_project AS (
2
SELECT
3
project_id,
4
count(*) as task_count
5
FROM tasks
6
GROUP BY project_id
7
),
8
9
tasks_per_project_per_user AS (
10
SELECT
11
user_id,
12
project_id,
13
count(*) as task_count
14
FROM tasks
15
GROUP BY user_id, project_id
16
),
现在我们将组合一下然后发现超过50%的用户
overloaded_users AS (
SELECT tasks_per_project_per_user.user_id
FROM tasks_per_project_per_user,
total_tasks_per_project
WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2)
)
6
1
overloaded_users AS (
2
SELECT tasks_per_project_per_user.user_id
3
FROM tasks_per_project_per_user,
4
total_tasks_per_project
5
WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2)
6
)
最终目标,我想获得超负荷工作这的用户和任务的逗号分隔列表。我们只要简单地对overloaded_users和 users_tasks的初始列表进行join操作。放在一起可能有点长,但是可读性强。作为额外帮助,我又在每一层加了注释。
--- Query highlights users that have over 50% of tasks on a given project
--- Gives comma separated list of their tasks and the project
--- Initial query to grab project title and tasks per user
WITH users_tasks AS (
SELECT
users.id as user_id,
users.email,
array_agg(tasks.name) as task_list,
projects.title
FROM
users,
tasks,
project
WHERE
users.id = tasks.user_id
projects.title = tasks.project_id
GROUP BY
users.email,
projects.title
),
--- Calculates the total tasks per each project
total_tasks_per_project AS (
SELECT
project_id,
count(*) as task_count
FROM tasks
GROUP BY project_id
),
--- Calculates the projects per each user
tasks_per_project_per_user AS (
SELECT
user_id,
project_id,
count(*) as task_count
FROM tasks
GROUP BY user_id, project_id
),
--- Gets user ids that have over 50% of tasks assigned
overloaded_users AS (
SELECT tasks_per_project_per_user.user_id
FROM tasks_per_project_per_user,
total_tasks_per_project
WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2)
)
SELECT
email,
task_list,
title
FROM
users_tasks,
overloaded_users
WHERE
users_tasks.user_id = overloaded_users.user_id
58
1
--- Query highlights users that have over 50% of tasks on a given project
2
--- Gives comma separated list of their tasks and the project
3
4
--- Initial query to grab project title and tasks per user
5
WITH users_tasks AS (
6
SELECT
7
users.id as user_id,
8
users.email,
9
array_agg(tasks.name) as task_list,
10
projects.title
11
FROM
12
users,
13
tasks,
14
project
15
WHERE
16
users.id = tasks.user_id
17
projects.title = tasks.project_id
18
GROUP BY
19
users.email,
20
projects.title
21
),
22
23
--- Calculates the total tasks per each project
24
total_tasks_per_project AS (
25
SELECT
26
project_id,
27
count(*) as task_count
28
FROM tasks
29
GROUP BY project_id
30
),
31
32
--- Calculates the projects per each user
33
tasks_per_project_per_user AS (
34
SELECT
35
user_id,
36
project_id,
37
count(*) as task_count
38
FROM tasks
39
GROUP BY user_id, project_id
40
),
41
42
--- Gets user ids that have over 50% of tasks assigned
43
overloaded_users AS (
44
SELECT tasks_per_project_per_user.user_id
45
FROM tasks_per_project_per_user,
46
total_tasks_per_project
47
WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2)
48
)
49
50
SELECT
51
email,
52
task_list,
53
title
54
FROM
55
users_tasks,
56
overloaded_users
57
WHERE
58
users_tasks.user_id = overloaded_users.user_id
来个delete的例子:
本例通过WITH中的DELETE语句从products表中删除了一个月的数据,并通过RETURNING子句将删除的数据集赋给moved_rows这一CTE,最后在主语句中通过INSERT将删除的商品插入products_log中。
WITH moved_rows AS (
DELETE FROM products
WHERE
"date" >= ‘2010-10-01‘
AND "date" < ‘2010-11-01‘
RETURNING *
)
INSERT INTO products_log
SELECT * FROM moved_rows;
9
1
WITH moved_rows AS (
2
DELETE FROM products
3
WHERE
4
"date" >= ‘2010-10-01‘
5
AND "date" < ‘2010-11-01‘
6
RETURNING *
7
)
8
INSERT INTO products_log
9
SELECT * FROM moved_rows;
如果WITH里面使用的不是SELECT语句,并且没有通过RETURNING子句返回结果集,则主查询中不可以引用该CTE,但主查询和WITH语句仍然可以继续执行。这种情况可以实现将多个不相关的语句放在一个SQL语句里,实现了在不显式使用事务的情况下保证WITH语句和主语句的事务性。
WITH使用注意事项【个人感觉有点类似线程不安全】
WITH中的数据修改语句会被执行一次,并且肯定会完全执行,无论主语句是否读取或者是否读取所有其输出。而WITH中的SELECT语句则只输出主语句中所需要记录数。 WITH中使用多个子句时,这些子句和主语句会并行执行,所以当存在多个修改子语句修改相同的记录时,它们的结果不可预测。 所有的子句所能“看”到的数据集是一样的,所以它们看不到其它语句对目标数据集的影响。这也缓解了多子句执行顺序的不可预测性造成的影响。 如果在一条SQL语句中,更新同一记录多次,只有其中一条会生效,并且很难预测哪一个会生效。 如果在一条SQL语句中,同时更新和删除某条记录,则只有更新会生效。 目前,任何一个被数据修改CTE的表,不允许使用条件规则,和ALSO规则以及INSTEAD规则。
WITH RECURSIVE
WITH语句还可以通过增加RECURSIVE修饰符来引入它自己,从而实现递归。
WITH RECURSIVE一般用于处理逻辑上层次化或树状结构的数据,典型的使用场景是寻找直接及间接子结点。