Trees and hierarchies can be found everywhere. In our computers’ filesystem, in online shops’ product hierarchies, in the organizational structure of a company, in blog categories, in family trees, the structure of an html page or in geography tables managing the relation between continents, countries cities.
In this article I will describe managing those type of data structure in MySQL using the Adjacency List Model.
There are various ways in managing those types of data structures in MySQL. The most common ones are
- Adjacency List Model
- Nested Sets Model
- Path Enumeration Model
Depending on the basic operations of a tree, that are reading, inserting, deleting, moving nodes or subtrees all models have their pros and cons in performance.
The Adjacency List Model is a very popular choice by developers and database administrators because it is pretty straight-forward and simple. It’s main advantage is the performance in modifying the table. In reading it needs an recursive approach to be effective, which is available in MySQL since version 8 and MariaDB since version 10.2.2 through the Common Table Expressions (CTE)
Example and Terminology
Have a look at that extraction of a linux file system:
To be more familiar with the modeling we need some terminlogy:
- The top node marked as / is the
root
node - bin, etc, home, tmp, usr and var are
children
of the root node, theirparent
- Nodes that do not have children are called
leaves
- The
path
of a node is the way up, following the parents until it reaches the root node, for example the path of project1 would be project1 > www > var -> / - The
level
of a node is the length of its path to the root, root itself has level 0, its children level 1 and so on. The level of project1 is 3. - The
siblings
of a node are all nodes that have the sameparent
.
Structure
To create the adjaceny model in SQL we will set following table structure with the columns id
, parent_id
and title
.
CREATE TABLE tree_adj ( id int(10) unsigned NOT NULL AUTO_INCREMENT, parent_id int(10) unsigned DEFAULT NULL, title varchar(255) NOT NULL, PRIMARY KEY (id), FOREIGN KEY (parent_id) REFERENCES tree_adj (id) ON DELETE CASCADE ON UPDATE CASCADE );
Each row corresponds to one node in the tree. The parent_id
is an internal foreign key and acts like a pointer to its parent, defined by id
. With the contraints we make sure that deleting a node would delete all its children and grandchildren.
Inserting Data:
INSERT INTO `tree_adj` (`id`, `parent_id`, `title`) VALUES (1, NULL, ''), (2, 1, 'bin'), (3, 1, 'etc'), (4, 1, 'home'), (5, 1, 'tmp'), (6, 1, 'usr'), (7, 1, 'var'), (8, 3, 'apache2'), (9, 3, 'apt'), (10, 3, 'mysql'), (11, 8, 'conf-available'), (12, 8, 'mods-available'), (13, 8, 'sites-available'), (14, 4, 'roland'), (15, 4, 'albert'), (16, 6, 'bin'), (17, 6, 'src'), (18, 6, 'share'), (19, 7, 'backups'), (20, 7, 'cache'), (21, 7, 'www'), (22, 21, 'project1'), (23, 22, 'app'), (24, 22, 'log'), (25, 22, 'vendor'), (26, 22, 'public'), (27, 25, 'bin'), (28, 25, 'composer');
Finding Root Node
By definition a tree can only have one root node, otherwise it would be called a forrest. Our structure allows both trees and forrests. In general we can find the root node(s) like this
SELECT id, parent_id, title FROM tree_adj WHERE parent_id IS NULL
In our example we only have one root node we can also address directly by its id
SELECT id, parent_id, title FROM tree_adj WHERE id = 1 -- the root node id
Finding immediate Children of a Node
/* children for node */ SELECT id, parent, title FROM tree_adj WHERE parent_id = 1 -- the tree node ORDER BY title;
Finding Leave Nodes
As we have learned, leave nodes are the nodes that have no children, so the query could look like that
/* leave nodes */ SELECT t.id, t.parent_id, t.title FROM tree_adj t WHERE NOT EXISTS (SELECT 1 FROM tree_adj where parent_id = t.id);
Retrieve whole Tree or Subtree
With the introduction of Common Table Expressions (CTE) reading the complete tree has become easy since you can apply recursion. Before it was difficult since you would need some a priori data about the maximal depth of the tree and use several left joins on the same tree table.
WITH RECURSIVE tree_path (id, parent_id, title, path) AS ( SELECT id, parent_id, title as title, CONCAT(title, '/') as path FROM tree_adj WHERE id = 1 -- the tree node UNION ALL SELECT t.id, t.parent_id, t.title, CONCAT(tp.path, t.title, '/') FROM tree_path AS tp JOIN tree_adj AS t ON tp.id = t.parent_id ) SELECT * FROM tree_path ORDER BY path;
Retrieve Path for a Node
WITH RECURSIVE tree_path (id, parent_id, title, path) AS ( SELECT id, parent_id, title, CONCAT(title, '/') as path FROM tree_adj WHERE id = 27 UNION ALL SELECT t.id, t.parent_id, t.title, CONCAT(t.title, '/', tp.path) as path FROM tree_path AS tp JOIN tree_adj AS t ON tp.parent_id = t.id ) SELECT * FROM tree_path;
Calculating the Depth for all Nodes
WITH RECURSIVE tree_path (id, parent_id, title, lvl) AS ( SELECT id, parent_id, title, 0 lvl FROM tree_adj WHERE parent_id IS NULL UNION ALL SELECT t.id, t.parent_id, t.title, tp.lvl + 1 FROM tree_path AS tp JOIN tree_adj AS t ON tp.id = t.parent_id ) SELECT * FROM tree_path ORDER BY lvl;
Deleting Nodes
a) Delete Node and all its Descendants
To delete a node and all his descendants, just remove the node, the complete subtree underneath will automatically be removed by the DELETE CASCADE
of the foreign key constraint. To completely delete the folder /var/www/project1 we can run:
DELETE FROM tree_adj WHERE id = 22;
b) Only allow deleting Leave Nodes
Sometimes you do not want to delete all descendants when deleting a node. One possible strategy could be that we do not allow deleting nodes that have children.
This we can achieve by changing the foreign key constraint from DELETE CASCADE
to DELETE RESTRICT
.
In case your contraint has the name “tree_adj_ibfk_1” the query is
ALTER TABLE `tree_adj` DROP FOREIGN KEY `tree_adj_ibfk_1`; ALTER TABLE `tree_adj` ADD CONSTRAINT `tree_adj_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `tree_adj` (`id`) ON UPDATE RESTRICT ON DELETE RESTRICT;
c) Delete Node and promote its Descendants
Before we delete a node, we need to know where its immediate children should go and assign them a new parent. So we need two operations to be executed:
- Update the
parent_id
of all its immediate children with theid
of the new parent - Then, delete the node.
Let’s delete again the directory /var/www/project1/ (id
= 22) but this time we will move all its children to /tmp/, so new parent_id
shall be 5.
We put all operations in one transaction:
BEGIN; UPDATE tree_adj SET parent_id = 5 -- new parent WHERE parent_id = 22; DELETE FROM tree_adj WHERE id = 22; -- node to delete COMMIT;
Moving Nodes or Subtrees
Moving nodes or subtrees in the adjacency list is pretty straight-forward. It does not depend if node is root of a subtree or it is a leaf node.
Just need to update the parent_id
of the top node of the subtree.
Lets move “/var/www/project1/” to “/home/albert/“.
UPDATE tree_adj SET parent_id = 15 WHERE id = 22
Get Siblings for a Node
Sometime we need to know the siblings of a node, that means all nodes that have the same parent as the node itself. If we inlucde the node itself in the result depends on the requirement of the application. We include the node in the list and add a separate column active (1 means it’s the node itself, 0 else)
Let’s do that for the “/tmp/” folder:
SELECT t2.id, t2.parent_id, t2.title, IF(t1.id = t2.id,1, 0) as `active` FROM tree_adj t1 INNER JOIN tree_adj t2 ON t2.parent_id = t1.parent_id WHERE t1.id = 5 ORDER BY t2.title
Summary
We have learned about managing trees in MySQL using the Adjacency List Model.
This model uses a single table with self referencing foreign key.
It is fast and easy when modifying the structure.
There have been problems in reading the whole tree or subtrees, but with the Recursive Common Table Expressions introduced in later versions of MySQL/MariaDB this has been solved
It is one of the most popular models to represent trees