如何在SQL中处理层次型数据
Posted jiangonemm
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何在SQL中处理层次型数据相关的知识,希望对你有一定的参考价值。
最近在做公司的认证系统,看了开源项目如apache shiro跟spring security,还不知道是自己构建还是用上述代码。最近的考虑点是如何处理层次型数据,因为打算给user构造一个有层次的group,而且是n:m的,这篇文章虽然不能帮助我解决这个问题,但是作为一个基础文章,算是很扎实的了,所以自己翻译了一下。
原文链接
介绍
多数使用者都会与sql的层次型数据打一次或多次交道,并明确地领悟到层次型数据的处理并不是关系型数据库的目标。关系型数据库的表格不是层次型(例如xml),而是一个简单的目录。层次型数据拥有父母-子的关系,因而不能单纯地替代为关系型数据库表格。
我们的目标是层次型数据是一种每个项目都拥有一个父母跟0或多个子(root项目属于例外,没有父母)的数据集合。层次型数据可以在各种数据库应用里找到,包括论坛跟邮件列表,商业组织表格,内容管理分类以及产品分类等。我们将使用从虚拟电子商店获得的产品分类层次。
这些分类整列了一种与以上其他引用例子相似的层次结构。在这篇文章里,为了测试SQL中的层次型数据,我们将检验两种模型,先从标准的邻接目录模型开始。
邻接目录模型
一般来说,上例分类会以以下形式存于表格当中(我包含了完整的create与insert语句,你可以顺着看):
CREATE TABLE category(
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT DEFAULT NULL
);
INSERT INTO category VALUES(1,'ELECTRONICS',NULL),(2,'TELEVISIONS',1),(3,'TUBE',2),
(4,'LCD',2),(5,'PLASMA',2),(6,'PORTABLE ELECTRONICS',1),(7,'MP3 PLAYERS',6),(8,'FLASH',7),
(9,'CD PLAYERS',6),(10,'2 WAY RADios',6);
SELECT * FROM category ORDER BY category_id;
+-------------+----------------------+--------+
| category_id | name | parent |
+-------------+----------------------+--------+
| 1 | ELECTRONICS | NULL |
| 2 | TELEVISIONS | 1 |
| 3 | TUBE | 2 |
| 4 | LCD | 2 |
| 5 | PLASMA | 2 |
| 6 | PORTABLE ELECTRONICS | 1 |
| 7 | MP3 PLAYERS | 6 |
| 8 | FLASH | 7 |
| 9 | CD PLAYERS | 6 |
| 10 | 2 WAY RADIOS | 6 |
+-------------+----------------------+--------+
10 rows in set (0.00 sec)
在邻接目录模型中,各个项目(item)在表格中都含有自己父母的指针(pointer)。最上面的元素,这里为electronics,带有null值为它的父母。邻接目录模型有简单的优点,我们可以简单看到flash是mp3 players的子类。邻接目录模型公平简单地被当作client方的代码,使用该模型在纯粹的SQL时可能会有更多问题。
检索一个完整的树
当使用一个层次型数据时要处理的第一个共同问题是整体树的展现,经常是含有一些格式的起行。在纯粹SQL中处理这项任务的最常用的方法是通过self-join操作:
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS';
+-------------+----------------------+--------------+-------+
| lev1 | lev2 | lev3 | lev4 |
+-------------+----------------------+--------------+-------+
| ELECTRONICS | TELEVISIONS | TUBE | NULL |
| ELECTRONICS | TELEVISIONS | LCD | NULL |
| ELECTRONICS | TELEVISIONS | PLASMA | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
| ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL |
+-------------+----------------------+--------------+-------+
6 rows in set (0.00 sec)
寻找所有的末节点
我们可以在树上通过Left Join query来寻找所有的末节点(没有子类的):
SELECT t1.name FROM
category AS t1 LEFT JOIN category as t2
ON t1.category_id = t2.parent
WHERE t2.category_id IS NULL;
+--------------+
| name |
+--------------+
| TUBE |
| LCD |
| PLASMA |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+--------------+
检索一个单路径
通过我们层次结构的self-join操作还能让我们看到完整路径:
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS' AND t4.name = 'FLASH';
+-------------+----------------------+-------------+-------+
| lev1 | lev2 | lev3 | lev4 |
+-------------+----------------------+-------------+-------+
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
+-------------+----------------------+-------------+-------+
1 row in set (0.01 sec)
这种接近方式的主要局限在于你需要一个self-join在每个层次的等级中,因此性能会随着各个等级中添加的joining复杂度而自然下降。
邻接目录模型的局限性
在纯粹SQL中使用邻接目录模型充其量是困难的。知道一个分类的完整路径以前,我们需要知道它处于的等级。而且,必须要特殊处理删除节点出现的潜在问题,例如整个子树会成为孤儿的问题(当删除portable electronics分类时,他的子类都会成为孤儿)。这些局限性可以通过使用client-side代码或储存步骤来解决。使用步骤性语言时我们可以从树的下方开始,然后往上反复以获得整个树或者单一路径。我们还可以有步骤地删除节点,通过提升一个子元素以及重新排列剩下的子类到新的父母,避免子树成为孤儿。
鸟巢集合模型
我想在这篇文章中强调的另一种接近方法是,大家统称为鸟巢集合的模型。在鸟巢模型中,我们可以将层次型数据以新的方式查看,并不是节点与连线,而是巢型的容器。将我们的电器分类做图的话是这样:
注意我们的层次是如何维持的,父母分类包含着他们的子类。我们将该层次重新表现为一个表格,通过在节点上使用左值与右值的方法:
CREATE TABLE nested_category (
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
lft INT NOT NULL,
rgt INT NOT NULL
);
INSERT INTO nested_category VALUES(1,'ELECTRONICS',1,20),(2,'TELEVISIONS',2,9),(3,'TUBE',3,4),
(4,'LCD',5,6),(5,'PLASMA',7,8),(6,'PORTABLE ELECTRONICS',10,19),(7,'MP3 PLAYERS',11,14),(8,'FLASH',12,13),
(9,'CD PLAYERS',15,16),(10,'2 WAY RADIOS',17,18);
SELECT * FROM nested_category ORDER BY category_id;
+-------------+----------------------+-----+-----+
| category_id | name | lft | rgt |
+-------------+----------------------+-----+-----+
| 1 | ELECTRONICS | 1 | 20 |
| 2 | TELEVISIONS | 2 | 9 |
| 3 | TUBE | 3 | 4 |
| 4 | LCD | 5 | 6 |
| 5 | PLASMA | 7 | 8 |
| 6 | PORTABLE ELECTRONICS | 10 | 19 |
| 7 | MP3 PLAYERS | 11 | 14 |
| 8 | FLASH | 12 | 13 |
| 9 | CD PLAYERS | 15 | 16 |
| 10 | 2 WAY RADIOS | 17 | 18 |
+-------------+----------------------+-----+-----+
我们使用lft与rgt,因为left与right在mysql中有自己的含义,请查看 http://dev.mysql.com/doc/mysql/en/reserved-words.html 以获得完整的含义单词。
那么我们如何决定左值与右值呢?我们从最左侧开始逐渐往右:
这种设计也可以用一般的树来表现:
当使用树时,我们从左到右,一个layer一次,在命名右值之前先执行各个节点的子类,然后再转到右值。这种接近方式称为修改的前序遍历算法。
检索一个完整的树
我们可以检索一个完整的树通过使用self-join,连接到父母的节点左值会一直在父母的左值与右值之间。
SELECT node.name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND parent.name = 'ELECTRONICS'
ORDER BY node.lft;
+----------------------+
| name |
+----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+----------------------+
不像之前的例子,该query并不需要树的深度就能工作。我们不需要在我们的between从句中担心节点右值,因为右值总是落在同一父母的左值。
寻找所有的末节点
在鸟巢集合模型中寻找所有的末节点比邻接目录模型中使用left join简单。如果你想看nested_category表格,你可能要注意末节点的左值与右值是连续的数字。如果想找到末节点,我们只需要找到右值=左值+1的节点:
SELECT name
FROM nested_category
WHERE rgt = lft + 1;
+--------------+
| name |
+--------------+
| TUBE |
| LCD |
| PLASMA |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+--------------+
检索一个单路径
在鸟巢集合模型中,我们可以在不进行多个self-join的情况下检索一个单路径。
SELECT parent.name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.name = 'FLASH'
ORDER BY parent.lft;
+----------------------+
| name |
+----------------------+
| ELECTRONICS |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
+----------------------+
寻找节点的深度
我们已经看了如何显示完整的树,但如果我们想显示各个节点的深度,更好地验证节点适合与层次会怎么样?它可以用添加一个count函数和group by语句来实现展示完整的树:
SELECT node.name, (COUNT(parent.name) - 1) AS depth
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+----------------------+-------+
| name | depth |
+----------------------+-------+
| ELECTRONICS | 0 |
| TELEVISIONS | 1 |
| TUBE | 2 |
| LCD | 2 |
| PLASMA | 2 |
| PORTABLE ELECTRONICS | 1 |
| MP3 PLAYERS | 2 |
| FLASH | 3 |
| CD PLAYERS | 2 |
| 2 WAY RADIOS | 2 |
+----------------------+-------+
我们可以用深度值和concat、repeat函数空下我们的分类名称:
SELECT CONCAT( REPEAT(' ', COUNT(parent.name) - 1), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+-----------------------+
| name |
+-----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+-----------------------+
当然,在client-side应用中你可能需要用深度值直接显示你的层次。web开发者可以循环整个树,添加< li >< /li >和< ul>< /ul>标签到深度值增加和减少时。
子树的深度
当我们需要子树的深度值时,我们不能将self-join局限在节点和父母表格,因为它会使结果出错。作为替代,我们增加一个第三方的self-join,沿着sub-query确定深度值,它将会成为新的七点为了我们的子树:
SELECT node.name, (COUNT(parent.name) - (sub_tree.depth + 1)) AS depth
FROM nested_category AS node,
nested_category AS parent,
nested_category AS sub_parent,
(
SELECT node.name, (COUNT(parent.name) - 1) AS depth
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.name = 'PORTABLE ELECTRONICS'
GROUP BY node.name
ORDER BY node.lft
)AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
AND sub_parent.name = sub_tree.name
GROUP BY node.name
ORDER BY node.lft;
+----------------------+-------+
| name | depth |
+----------------------+-------+
| PORTABLE ELECTRONICS | 0 |
| MP3 PLAYERS | 1 |
| FLASH | 2 |
| CD PLAYERS | 1 |
| 2 WAY RADIOS | 1 |
+----------------------+-------+
该函数可以用在任何节点名称,包括root节点。深度值将总是与指定节点有关联。
寻找直接的节点从属
想象一下,你要展示一个电子产品的分类在网站上。当使用者点击一个分类,你会想展示该分类的产品,相当于它直接的子分类,但不是该分类以下的整个树。为了这项目表,我们需要展示节点跟它直接的子节点,但不是更远的层。例如,当展示portable electronics分类时,我们需要展示mp3 players,cd players和2 way radios,但不是flash。
这个可以用添加一个having语句到我们之前的query来解决:
SELECT node.name, (COUNT(parent.name) - (sub_tree.depth + 1)) AS depth
FROM nested_category AS node,
nested_category AS parent,
nested_category AS sub_parent,
(
SELECT node.name, (COUNT(parent.name) - 1) AS depth
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.name = 'PORTABLE ELECTRONICS'
GROUP BY node.name
ORDER BY node.lft
)AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
AND sub_parent.name = sub_tree.name
GROUP BY node.name
HAVING depth <= 1
ORDER BY node.lft;
+----------------------+-------+
| name | depth |
+----------------------+-------+
| PORTABLE ELECTRONICS | 0 |
| MP3 PLAYERS | 1 |
| CD PLAYERS | 1 |
| 2 WAY RADIOS | 1 |
+----------------------+-------+
如果你不想展示父母节点,修改having depth<=1为having depth=1。
鸟巢集合的函数综合
让我们添加一个产品表格来证明综合函数的使用:
CREATE TABLE product
(
product_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(40),
category_id INT NOT NULL
);
INSERT INTO product(name, category_id) VALUES('20" TV',3),('36" TV',3),
('Super-LCD 42"',4),('Ultra-Plasma 62"',5),('Value Plasma 38"',5),
('Power-MP3 5gb',7),('Super-Player 1gb',8),('Porta CD',9),('CD To go!',9),
('Family Talk 360',10);
SELECT * FROM product;
+------------+-------------------+-------------+
| product_id | name | category_id |
+------------+-------------------+-------------+
| 1 | 20" TV | 3 |
| 2 | 36" TV | 3 |
| 3 | Super-LCD 42" | 4 |
| 4 | Ultra-Plasma 62" | 5 |
| 5 | Value Plasma 38" | 5 |
| 6 | Power-MP3 128mb | 7 |
| 7 | Super-Shuffle 1gb | 8 |
| 8 | Porta CD | 9 |
| 9 | CD To go! | 9 |
| 10 | Family Talk 360 | 10 |
+------------+-------------------+-------------+
现在我们生成一个可以检索我们树分类的query,沿着各个分类的产品count。
SELECT parent.name, COUNT(product.name)
FROM nested_category AS node ,
nested_category AS parent,
product
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.category_id = product.category_id
GROUP BY parent.name
ORDER BY node.lft;
+----------------------+---------------------+
| name | COUNT(product.name) |
+----------------------+---------------------+
| ELECTRONICS | 10 |
| TELEVISIONS | 5 |
| TUBE | 2 |
| LCD | 1 |
| PLASMA | 2 |
| PORTABLE ELECTRONICS | 5 |
| MP3 PLAYERS | 2 |
| FLASH | 1 |
| CD PLAYERS | 2 |
| 2 WAY RADIOS | 1 |
+----------------------+---------------------+
这是典型的含有count与group by的整个树的query,沿着product的参照,以及使用join在节点跟产品表格的where语句当中。如你所见,这里有各个分类的数量,还有子分类的数量反射在父母分类当中。
添加新的节点
险遭我们学会了如何query我们的树,我们将查看添加一个新节点时如何升级我们的树。让我们重新看看鸟巢集合的图片:
如果我们要在televisions和portable electronic节点之间增加一个节点,新节点将会拥有左值10和右值11,然后它右边的节点的左值和右值会增加2。我们将寻找合适的左值和右值添加新节点。在MySQL5中可以用储存的步骤实现,我假设读者使用4.1,因为这是最新版本,我将用lock tables区分我的query语句:
LOCK TABLE nested_category WRITE;
SELECT @myRight := rgt FROM nested_category
WHERE name = 'TELEVISIONS';
UPDATE nested_category SET rgt = rgt + 2 WHERE rgt > @myRight;
UPDATE nested_category SET lft = lft + 2 WHERE lft > @myRight;
INSERT INTO nested_category(name, lft, rgt) VALUES('GAME CONSOLES', @myRight + 1, @myRight + 2);
UNLOCK TABLES;
然后我们可以
以上是关于如何在SQL中处理层次型数据的主要内容,如果未能解决你的问题,请参考以下文章