使用 row_to_json 的 Postgres 递归查询
Posted
技术标签:
【中文标题】使用 row_to_json 的 Postgres 递归查询【英文标题】:Postgres recursive query with row_to_json 【发布时间】:2014-10-29 23:20:04 【问题描述】:我在 postgres 9.3.5 中有一个如下所示的表:
CREATE TABLE customer_area_node
(
id bigserial NOT NULL,
customer_id integer NOT NULL,
parent_id bigint,
name text,
description text,
CONSTRAINT customer_area_node_pkey PRIMARY KEY (id)
)
我查询:
WITH RECURSIVE c AS (
SELECT *, 0 as level, name as path FROM customer_area_node WHERE customer_id = 2 and parent_id is null
UNION ALL
SELECT customer_area_node.*,
c.level + 1 as level,
c.path || '/' || customer_area_node.name as path
FROM customer_area_node
join c ON customer_area_node.parent_id = c.id
)
SELECT * FROM c ORDER BY path;
这似乎适用于构建诸如 building1/floor1/room1、building1/floor1/room2 等路径。
我希望能够轻松地将其转换为代表树结构的 json,我被告知我可以使用 row_to_json 来完成。
作为一种合理的选择,我可以通过任何其他方式将数据格式化为更有效的机制,这样我实际上可以轻松地将其转换为实际的树结构,而无需在 / 上添加大量 string.splits。
是否有一种相当简单的方法可以使用 row_to_json 做到这一点?
【问题讨论】:
你能提供样本数据吗? 【参考方案1】:普通递归 CTE 无法做到这一点,因为几乎不可能在其层次结构的深处设置 json 值。但是你可以反过来做:从叶子开始构建树,直到它的根:
-- calculate node levels
WITH RECURSIVE c AS (
SELECT *, 0 as lvl
FROM customer_area_node
-- use parameters here, to select the root first
WHERE customer_id = 2 AND parent_id IS NULL
UNION ALL
SELECT customer_area_node.*, c.lvl + 1 as lvl
FROM customer_area_node
JOIN c ON customer_area_node.parent_id = c.id
),
-- select max level
maxlvl AS (
SELECT max(lvl) maxlvl FROM c
),
-- accumulate children
j AS (
SELECT c.*, json '[]' children -- at max level, there are only leaves
FROM c, maxlvl
WHERE lvl = maxlvl
UNION ALL
-- a little hack, because PostgreSQL doesn't like aggregated recursive terms
SELECT (c).*, array_to_json(array_agg(j)) children
FROM (
SELECT c, j
FROM j
JOIN c ON j.parent_id = c.id
) v
GROUP BY v.c
)
-- select only root
SELECT row_to_json(j) json_tree
FROM j
WHERE lvl = 0;
这甚至适用于 PostgreSQL 9.2+
SQLFiddle
更新:一个变体,它也应该处理流氓叶节点(位于 1 和 max-level 之间的级别):
WITH RECURSIVE c AS (
SELECT *, 0 as lvl
FROM customer_area_node
WHERE customer_id = 1 AND parent_id IS NULL
UNION ALL
SELECT customer_area_node.*, c.lvl + 1
FROM customer_area_node
JOIN c ON customer_area_node.parent_id = c.id
),
maxlvl AS (
SELECT max(lvl) maxlvl FROM c
),
j AS (
SELECT c.*, json '[]' children
FROM c, maxlvl
WHERE lvl = maxlvl
UNION ALL
SELECT (c).*, array_to_json(array_agg(j) || array(SELECT r
FROM (SELECT l.*, json '[]' children
FROM c l, maxlvl
WHERE l.parent_id = (c).id
AND l.lvl < maxlvl
AND NOT EXISTS (SELECT 1
FROM c lp
WHERE lp.parent_id = l.id)) r)) children
FROM (SELECT c, j
FROM c
JOIN j ON j.parent_id = c.id) v
GROUP BY v.c
)
SELECT row_to_json(j) json_tree
FROM j
WHERE lvl = 0;
这个应该在 PostgreSQL 9.2+ 上也可以工作,但是,我无法测试。 (我现在只能在 9.5+ 上测试)。
这些解决方案可以处理任何分层表中的任何列,但总是将int
类型为lvl
JSON 属性附加到它们的输出中。
http://rextester.com/YNU7932
【讨论】:
我发现所有叶子都必须有相同的水平才能工作。 @Macario 在 SQLFiddle 示例中,有多个级别的叶子(例如1.3.7
与 1.3.6.9
)并收集所有节点。
@Macario 我遇到了与@pozs 报告的相同问题。我用1.4.10
创建了一个示例SQLFiddle,显示了这种情况,它只有2 个叶子,而不是像其他所有叶子一样的3 个叶子。因此,使用此 SQL,您必须使所有分支的深度完全相同。
抱歉迟到了,但我也可以找到解决这些叶节点的方法。【参考方案2】:
很抱歉,答案很晚,但我想我找到了一个优雅的解决方案,可以成为这个问题的公认答案。
基于@pozs 发现的令人敬畏的“小技巧”,我想出了一个解决方案:
用很少的代码解决“流氓离开”的情况(利用NOT EXISTS
谓词)
避免整个级别计算/条件的东西
WITH RECURSIVE customer_area_tree("id", "customer_id", "parent_id", "name", "description", "children") AS (
-- tree leaves (no matching children)
SELECT c.*, json '[]'
FROM customer_area_node c
WHERE NOT EXISTS(SELECT * FROM customer_area_node AS hypothetic_child WHERE hypothetic_child.parent_id = c.id)
UNION ALL
-- pozs's awesome "little hack"
SELECT (parent).*, json_agg(child) AS "children"
FROM (
SELECT parent, child
FROM customer_area_tree AS child
JOIN customer_area_node parent ON parent.id = child.parent_id
) branch
GROUP BY branch.parent
)
SELECT json_agg(t)
FROM customer_area_tree t
LEFT JOIN customer_area_node AS hypothetic_parent ON(hypothetic_parent.id = t.parent_id)
WHERE hypothetic_parent.id IS NULL
更新:
Tested with very simple data,它确实有效,但正如 posz 在评论中指出的那样,with his sample data,一些流氓叶子节点被遗忘了。但是,我发现with even more complex data,之前的答案也不起作用,因为只有具有“最大级别”叶节点的共同祖先的流氓叶节点被捕获(当“1.2.5.8”不存在时,“1.2. 4”和“1.2.5”不存在,因为它们没有任何“最大级别”叶节点的共同祖先)。
所以这是一个新的提议,通过提取NOT EXISTS
子请求并使其成为内部UNION
,利用UNION
重复数据删除能力(利用 jsonb 比较能力),将 posz 的工作与我的工作混合:
<!-- language: sql -->
WITH RECURSIVE
c_with_level AS (
SELECT *, 0 as lvl
FROM customer_area_node
WHERE parent_id IS NULL
UNION ALL
SELECT child.*, parent.lvl + 1
FROM customer_area_node child
JOIN c_with_level parent ON parent.id = child.parent_id
),
maxlvl AS (
SELECT max(lvl) maxlvl FROM c_with_level
),
c_tree AS (
SELECT c_with_level.*, jsonb '[]' children
FROM c_with_level, maxlvl
WHERE lvl = maxlvl
UNION
(
SELECT (branch_parent).*, jsonb_agg(branch_child)
FROM (
SELECT branch_parent, branch_child
FROM c_with_level branch_parent
JOIN c_tree branch_child ON branch_child.parent_id = branch_parent.id
) branch
GROUP BY branch.branch_parent
UNION
SELECT c.*, jsonb '[]' children
FROM c_with_level c
WHERE NOT EXISTS (SELECT 1 FROM c_with_level hypothetical_child WHERE hypothetical_child.parent_id = c.id)
)
)
SELECT jsonb_pretty(row_to_json(c_tree)::jsonb)
FROM c_tree
WHERE lvl = 0;
在http://rextester.com/SMM38494 上测试;)
【讨论】:
I'm afraid,如果不对级别进行特殊处理,您最终会在输出中出现许多单独的“分支”,因为不同级别中存在许多叶子。您的示例数据对于每个节点只有 0 或 1 个子节点,这就是它没有显示的原因。 感谢您的观察!我一直在做一些工作,我发现你处理流氓叶节点的技巧也不是很有效,因为在你的例子中,如果你没有节点“1.2.5.8”,分支“1.2 " 永远不会被捕获,因此最终结果中不存在 "1.2.4" 和 "1.2.5"。这是因为只有当恶意叶节点具有“最大级别”叶节点的共同祖先时,您才会捕获它们。我找到了解决方案,我会在一分钟内编辑我的答案;) 对于任何感兴趣的人,我将 David 的第二次尝试修改为 return nested JSON objects。在 Python 中,我想将 JSON 反序列化为嵌套字典 - 为什么在代码中执行 SQL 中可以完成的操作;)我也希望对 javascript 中的数据结构进行类似的访问。谢谢@DavidGuillot! 这种方法很好,但不包括存在多个根项且一棵树比另一棵树更深的情况。它在根级别生成具有不同子级的重复项:sqlfiddle.com/#!17/022f80/8【参考方案3】:进一步开发了 pozs 的答案,以获得带有子树的递归叶子。所以这个答案真的返回了完整的树。
CREATE OR REPLACE FUNCTION pg_temp.getTree(bigint)
RETURNS TABLE(
id bigint,
customer_id integer,
parent_id bigint,
name text,
description text,
children json
)
AS $$
WITH RECURSIVE relations AS (
SELECT
can.id,
can.customer_id,
can.parent_id,
can.name,
can.description,
0 AS depth
FROM customer_area_node can
WHERE can.id = $1
UNION ALL
SELECT
can.id,
can.customer_id,
can.parent_id,
can.name,
can.description,
relations.depth + 1
FROM customer_area_node can
JOIN relations ON can.parent_id = relations.id AND can.id != can.parent_id
),
maxdepth AS (
SELECT max(depth) maxdepth FROM relations
),
rootTree as (
SELECT r.* FROM
relations r, maxdepth
WHERE depth = maxdepth
UNION ALL
SELECT r.* FROM
relations r, rootTree
WHERE r.id = rootTree.parent_id AND rootTree.id != rootTree.parent_id
),
mainTree AS (
SELECT
c.id,
c.customer_id,
c.parent_id,
c.name,
c.description,
c.depth,
json_build_array() children
FROM relations c, maxdepth
WHERE c.depth = maxdepth
UNION ALL
SELECT
(relations).*,
array_to_json(
array_agg(mainTree)
||
array(
SELECT t
FROM (
SELECT
l.*,
json_build_array() children
FROM relations l, maxdepth
WHERE l.parent_id = (relations).id
AND l.depth < maxdepth
AND l.id NOT IN (
SELECT id FROM rootTree
)
) r
JOIN pg_temp.getTree(r.id) t
ON r.id = t.id
))
children
FROM (
SELECT relations, mainTree
FROM relations
JOIN mainTree
ON (
mainTree.parent_id = relations.id
AND mainTree.parent_id != mainTree.id
)
) v
GROUP BY v.relations
)
SELECT
id,
customer_id,
parent_id,
name,
description,
children
FROM mainTree WHERE id = $1
$$
LANGUAGE SQL;
SELECT *
FROM
customer_area_node can
JOIN pg_temp.getTree(can.id) t ON t.id = can.id
WHERE can.parent_id IS NULL;
【讨论】:
以上是关于使用 row_to_json 的 Postgres 递归查询的主要内容,如果未能解决你的问题,请参考以下文章
如何将现有函数(包括聚合)包装到 Postgres 中的新函数中?