如何在SQL中处理层次型数据

Posted jiangonemm

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何在SQL中处理层次型数据相关的知识,希望对你有一定的参考价值。

最近在做公司的认证系统,看了开源项目如apache shiro跟spring security,还不知道是自己构建还是用上述代码。最近的考虑点是如何处理层次型数据,因为打算给user构造一个有层次的group,而且是n:m的,这篇文章虽然不能帮助我解决这个问题,但是作为一个基础文章,算是很扎实的了,所以自己翻译了一下。
原文链接

介绍

多数使用者都会与sql的层次型数据打一次或多次交道,并明确地领悟到层次型数据的处理并不是关系型数据库的目标。关系型数据库的表格不是层次型(例如xml),而是一个简单的目录。层次型数据拥有父母-子的关系,因而不能单纯地替代为关系型数据库表格。

我们的目标是层次型数据是一种每个项目都拥有一个父母跟0或多个子(root项目属于例外,没有父母)的数据集合。层次型数据可以在各种数据库应用里找到,包括论坛跟邮件列表,商业组织表格,内容管理分类以及产品分类等。我们将使用从虚拟电子商店获得的产品分类层次。

这些分类整列了一种与以上其他引用例子相似的层次结构。在这篇文章里,为了测试SQL中的层次型数据,我们将检验两种模型,先从标准的邻接目录模型开始。

邻接目录模型

一般来说,上例分类会以以下形式存于表格当中(我包含了完整的create与insert语句,你可以顺着看):

CREATE TABLE category(
        category_id INT AUTO_INCREMENT PRIMARY KEY,
        name VARCHAR(20) NOT NULL,
        parent INT DEFAULT NULL
);

INSERT INTO category VALUES(1,'ELECTRONICS',NULL),(2,'TELEVISIONS',1),(3,'TUBE',2),
        (4,'LCD',2),(5,'PLASMA',2),(6,'PORTABLE ELECTRONICS',1),(7,'MP3 PLAYERS',6),(8,'FLASH',7),
        (9,'CD PLAYERS',6),(10,'2 WAY RADios',6);

SELECT * FROM category ORDER BY category_id;
+-------------+----------------------+--------+
| category_id | name                 | parent |
+-------------+----------------------+--------+
|           1 | ELECTRONICS          |   NULL |
|           2 | TELEVISIONS          |      1 |
|           3 | TUBE                 |      2 |
|           4 | LCD                  |      2 |
|           5 | PLASMA               |      2 |
|           6 | PORTABLE ELECTRONICS |      1 |
|           7 | MP3 PLAYERS          |      6 |
|           8 | FLASH                |      7 |
|           9 | CD PLAYERS           |      6 |
|          10 | 2 WAY RADIOS         |      6 |
+-------------+----------------------+--------+
10 rows in set (0.00 sec)

在邻接目录模型中,各个项目(item)在表格中都含有自己父母的指针(pointer)。最上面的元素,这里为electronics,带有null值为它的父母。邻接目录模型有简单的优点,我们可以简单看到flash是mp3 players的子类。邻接目录模型公平简单地被当作client方的代码,使用该模型在纯粹的SQL时可能会有更多问题。

检索一个完整的树

当使用一个层次型数据时要处理的第一个共同问题是整体树的展现,经常是含有一些格式的起行。在纯粹SQL中处理这项任务的最常用的方法是通过self-join操作:

SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS';

+-------------+----------------------+--------------+-------+
| lev1        | lev2                 | lev3         | lev4  |
+-------------+----------------------+--------------+-------+
| ELECTRONICS | TELEVISIONS          | TUBE         | NULL  |
| ELECTRONICS | TELEVISIONS          | LCD          | NULL  |
| ELECTRONICS | TELEVISIONS          | PLASMA       | NULL  |
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS  | FLASH |
| ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS   | NULL  |
| ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL  |
+-------------+----------------------+--------------+-------+
6 rows in set (0.00 sec)

寻找所有的末节点

我们可以在树上通过Left Join query来寻找所有的末节点(没有子类的):

SELECT t1.name FROM
category AS t1 LEFT JOIN category as t2
ON t1.category_id = t2.parent
WHERE t2.category_id IS NULL;

+--------------+
| name         |
+--------------+
| TUBE         |
| LCD          |
| PLASMA       |
| FLASH        |
| CD PLAYERS   |
| 2 WAY RADIOS |
+--------------+

检索一个单路径

通过我们层次结构的self-join操作还能让我们看到完整路径:

SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS' AND t4.name = 'FLASH';

+-------------+----------------------+-------------+-------+
| lev1        | lev2                 | lev3        | lev4  |
+-------------+----------------------+-------------+-------+
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
+-------------+----------------------+-------------+-------+
1 row in set (0.01 sec)

这种接近方式的主要局限在于你需要一个self-join在每个层次的等级中,因此性能会随着各个等级中添加的joining复杂度而自然下降。

邻接目录模型的局限性

在纯粹SQL中使用邻接目录模型充其量是困难的。知道一个分类的完整路径以前,我们需要知道它处于的等级。而且,必须要特殊处理删除节点出现的潜在问题,例如整个子树会成为孤儿的问题(当删除portable electronics分类时,他的子类都会成为孤儿)。这些局限性可以通过使用client-side代码或储存步骤来解决。使用步骤性语言时我们可以从树的下方开始,然后往上反复以获得整个树或者单一路径。我们还可以有步骤地删除节点,通过提升一个子元素以及重新排列剩下的子类到新的父母,避免子树成为孤儿。

鸟巢集合模型

我想在这篇文章中强调的另一种接近方法是,大家统称为鸟巢集合的模型。在鸟巢模型中,我们可以将层次型数据以新的方式查看,并不是节点与连线,而是巢型的容器。将我们的电器分类做图的话是这样:

注意我们的层次是如何维持的,父母分类包含着他们的子类。我们将该层次重新表现为一个表格,通过在节点上使用左值与右值的方法:

CREATE TABLE nested_category (
        category_id INT AUTO_INCREMENT PRIMARY KEY,
        name VARCHAR(20) NOT NULL,
        lft INT NOT NULL,
        rgt INT NOT NULL
);

INSERT INTO nested_category VALUES(1,'ELECTRONICS',1,20),(2,'TELEVISIONS',2,9),(3,'TUBE',3,4),
 (4,'LCD',5,6),(5,'PLASMA',7,8),(6,'PORTABLE ELECTRONICS',10,19),(7,'MP3 PLAYERS',11,14),(8,'FLASH',12,13),
 (9,'CD PLAYERS',15,16),(10,'2 WAY RADIOS',17,18);

SELECT * FROM nested_category ORDER BY category_id;

+-------------+----------------------+-----+-----+
| category_id | name                 | lft | rgt |
+-------------+----------------------+-----+-----+
|           1 | ELECTRONICS          |   1 |  20 |
|           2 | TELEVISIONS          |   2 |   9 |
|           3 | TUBE                 |   3 |   4 |
|           4 | LCD                  |   5 |   6 |
|           5 | PLASMA               |   7 |   8 |
|           6 | PORTABLE ELECTRONICS |  10 |  19 |
|           7 | MP3 PLAYERS          |  11 |  14 |
|           8 | FLASH                |  12 |  13 |
|           9 | CD PLAYERS           |  15 |  16 |
|          10 | 2 WAY RADIOS         |  17 |  18 |
+-------------+----------------------+-----+-----+

我们使用lft与rgt,因为left与right在mysql中有自己的含义,请查看 http://dev.mysql.com/doc/mysql/en/reserved-words.html 以获得完整的含义单词。

那么我们如何决定左值与右值呢?我们从最左侧开始逐渐往右:

这种设计也可以用一般的树来表现:

当使用树时,我们从左到右,一个layer一次,在命名右值之前先执行各个节点的子类,然后再转到右值。这种接近方式称为修改的前序遍历算法。

检索一个完整的树

我们可以检索一个完整的树通过使用self-join,连接到父母的节点左值会一直在父母的左值与右值之间。

SELECT node.name
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND parent.name = 'ELECTRONICS'
ORDER BY node.lft;

+----------------------+
| name                 |
+----------------------+
| ELECTRONICS          |
| TELEVISIONS          |
| TUBE                 |
| LCD                  |
| PLASMA               |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS          |
| FLASH                |
| CD PLAYERS           |
| 2 WAY RADIOS         |
+----------------------+

不像之前的例子,该query并不需要树的深度就能工作。我们不需要在我们的between从句中担心节点右值,因为右值总是落在同一父母的左值。

寻找所有的末节点

在鸟巢集合模型中寻找所有的末节点比邻接目录模型中使用left join简单。如果你想看nested_category表格,你可能要注意末节点的左值与右值是连续的数字。如果想找到末节点,我们只需要找到右值=左值+1的节点:

SELECT name
FROM nested_category
WHERE rgt = lft + 1;

+--------------+
| name         |
+--------------+
| TUBE         |
| LCD          |
| PLASMA       |
| FLASH        |
| CD PLAYERS   |
| 2 WAY RADIOS |
+--------------+

检索一个单路径

在鸟巢集合模型中,我们可以在不进行多个self-join的情况下检索一个单路径。

SELECT parent.name
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.name = 'FLASH'
ORDER BY parent.lft;

+----------------------+
| name                 |
+----------------------+
| ELECTRONICS          |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS          |
| FLASH                |
+----------------------+

寻找节点的深度

我们已经看了如何显示完整的树,但如果我们想显示各个节点的深度,更好地验证节点适合与层次会怎么样?它可以用添加一个count函数和group by语句来实现展示完整的树:

SELECT node.name, (COUNT(parent.name) - 1) AS depth
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;

+----------------------+-------+
| name                 | depth |
+----------------------+-------+
| ELECTRONICS          |     0 |
| TELEVISIONS          |     1 |
| TUBE                 |     2 |
| LCD                  |     2 |
| PLASMA               |     2 |
| PORTABLE ELECTRONICS |     1 |
| MP3 PLAYERS          |     2 |
| FLASH                |     3 |
| CD PLAYERS           |     2 |
| 2 WAY RADIOS         |     2 |
+----------------------+-------+

我们可以用深度值和concat、repeat函数空下我们的分类名称:

SELECT CONCAT( REPEAT(' ', COUNT(parent.name) - 1), node.name) AS name
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;

+-----------------------+
| name                  |
+-----------------------+
| ELECTRONICS           |
|  TELEVISIONS          |
|   TUBE                |
|   LCD                 |
|   PLASMA              |
|  PORTABLE ELECTRONICS |
|   MP3 PLAYERS         |
|    FLASH              |
|   CD PLAYERS          |
|   2 WAY RADIOS        |
+-----------------------+

当然,在client-side应用中你可能需要用深度值直接显示你的层次。web开发者可以循环整个树,添加< li >< /li >和< ul>< /ul>标签到深度值增加和减少时。

子树的深度

当我们需要子树的深度值时,我们不能将self-join局限在节点和父母表格,因为它会使结果出错。作为替代,我们增加一个第三方的self-join,沿着sub-query确定深度值,它将会成为新的七点为了我们的子树:

SELECT node.name, (COUNT(parent.name) - (sub_tree.depth + 1)) AS depth
FROM nested_category AS node,
        nested_category AS parent,
        nested_category AS sub_parent,
        (
                SELECT node.name, (COUNT(parent.name) - 1) AS depth
                FROM nested_category AS node,
                nested_category AS parent
                WHERE node.lft BETWEEN parent.lft AND parent.rgt
                AND node.name = 'PORTABLE ELECTRONICS'
                GROUP BY node.name
                ORDER BY node.lft
        )AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
        AND sub_parent.name = sub_tree.name
GROUP BY node.name
ORDER BY node.lft;

+----------------------+-------+
| name                 | depth |
+----------------------+-------+
| PORTABLE ELECTRONICS |     0 |
| MP3 PLAYERS          |     1 |
| FLASH                |     2 |
| CD PLAYERS           |     1 |
| 2 WAY RADIOS         |     1 |
+----------------------+-------+

该函数可以用在任何节点名称,包括root节点。深度值将总是与指定节点有关联。

寻找直接的节点从属

想象一下,你要展示一个电子产品的分类在网站上。当使用者点击一个分类,你会想展示该分类的产品,相当于它直接的子分类,但不是该分类以下的整个树。为了这项目表,我们需要展示节点跟它直接的子节点,但不是更远的层。例如,当展示portable electronics分类时,我们需要展示mp3 players,cd players和2 way radios,但不是flash。

这个可以用添加一个having语句到我们之前的query来解决:

SELECT node.name, (COUNT(parent.name) - (sub_tree.depth + 1)) AS depth
FROM nested_category AS node,
        nested_category AS parent,
        nested_category AS sub_parent,
        (
                SELECT node.name, (COUNT(parent.name) - 1) AS depth
                FROM nested_category AS node,
                        nested_category AS parent
                WHERE node.lft BETWEEN parent.lft AND parent.rgt
                        AND node.name = 'PORTABLE ELECTRONICS'
                GROUP BY node.name
                ORDER BY node.lft
        )AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
        AND sub_parent.name = sub_tree.name
GROUP BY node.name
HAVING depth <= 1
ORDER BY node.lft;

+----------------------+-------+
| name                 | depth |
+----------------------+-------+
| PORTABLE ELECTRONICS |     0 |
| MP3 PLAYERS          |     1 |
| CD PLAYERS           |     1 |
| 2 WAY RADIOS         |     1 |
+----------------------+-------+

如果你不想展示父母节点,修改having depth<=1为having depth=1。

鸟巢集合的函数综合

让我们添加一个产品表格来证明综合函数的使用:

CREATE TABLE product
(
        product_id INT AUTO_INCREMENT PRIMARY KEY,
        name VARCHAR(40),
        category_id INT NOT NULL
);

INSERT INTO product(name, category_id) VALUES('20" TV',3),('36" TV',3),
('Super-LCD 42"',4),('Ultra-Plasma 62"',5),('Value Plasma 38"',5),
('Power-MP3 5gb',7),('Super-Player 1gb',8),('Porta CD',9),('CD To go!',9),
('Family Talk 360',10);

SELECT * FROM product;

+------------+-------------------+-------------+
| product_id | name              | category_id |
+------------+-------------------+-------------+
|          1 | 20" TV            |           3 |
|          2 | 36" TV            |           3 |
|          3 | Super-LCD 42"     |           4 |
|          4 | Ultra-Plasma 62"  |           5 |
|          5 | Value Plasma 38"  |           5 |
|          6 | Power-MP3 128mb   |           7 |
|          7 | Super-Shuffle 1gb |           8 |
|          8 | Porta CD          |           9 |
|          9 | CD To go!         |           9 |
|         10 | Family Talk 360   |          10 |
+------------+-------------------+-------------+

现在我们生成一个可以检索我们树分类的query,沿着各个分类的产品count。

SELECT parent.name, COUNT(product.name)
FROM nested_category AS node ,
        nested_category AS parent,
        product
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.category_id = product.category_id
GROUP BY parent.name
ORDER BY node.lft;

+----------------------+---------------------+
| name                 | COUNT(product.name) |
+----------------------+---------------------+
| ELECTRONICS          |                  10 |
| TELEVISIONS          |                   5 |
| TUBE                 |                   2 |
| LCD                  |                   1 |
| PLASMA               |                   2 |
| PORTABLE ELECTRONICS |                   5 |
| MP3 PLAYERS          |                   2 |
| FLASH                |                   1 |
| CD PLAYERS           |                   2 |
| 2 WAY RADIOS         |                   1 |
+----------------------+---------------------+

这是典型的含有count与group by的整个树的query,沿着product的参照,以及使用join在节点跟产品表格的where语句当中。如你所见,这里有各个分类的数量,还有子分类的数量反射在父母分类当中。

添加新的节点

险遭我们学会了如何query我们的树,我们将查看添加一个新节点时如何升级我们的树。让我们重新看看鸟巢集合的图片:

如果我们要在televisions和portable electronic节点之间增加一个节点,新节点将会拥有左值10和右值11,然后它右边的节点的左值和右值会增加2。我们将寻找合适的左值和右值添加新节点。在MySQL5中可以用储存的步骤实现,我假设读者使用4.1,因为这是最新版本,我将用lock tables区分我的query语句:

LOCK TABLE nested_category WRITE;

SELECT @myRight := rgt FROM nested_category
WHERE name = 'TELEVISIONS';

UPDATE nested_category SET rgt = rgt + 2 WHERE rgt > @myRight;
UPDATE nested_category SET lft = lft + 2 WHERE lft > @myRight;

INSERT INTO nested_category(name, lft, rgt) VALUES('GAME CONSOLES', @myRight + 1, @myRight + 2);

UNLOCK TABLES;

然后我们可以

以上是关于如何在SQL中处理层次型数据的主要内容,如果未能解决你的问题,请参考以下文章

如何在SQL中处理层次型数据

sql链接查询

MongoDB - 有助于级联层次结构?

二叉树的层次遍历

主进程被杀死时,如何保证子进程同时退出,而不变为孤儿进程

树的前中后序遍历和层次遍历