Managing Trees in MySQL Using the Adjacency List Model

Trees and hierarchies can be found everywhere. In our computers’ filesystem, in online shops’ product hierarchies, in the organizational structure of a company, in blog categories, in family trees, the structure of an html page or in geography tables managing the relation between continents, countries cities.

In this article I will describe managing those type of data structure in MySQL using the Adjacency List Model.

There are various ways in managing those types of data structures in MySQL. The most common ones are

  • Adjacency List Model
  • Nested Sets Model
  • Path Enumeration Model

Depending on the basic operations of a tree, that are reading, inserting, deleting, moving nodes or subtrees all models have their pros and cons in performance.

The Adjacency List Model is a very popular choice by developers and database administrators because it is pretty straight-forward and simple. It’s main advantage is the performance in modifying the table. In reading it needs an recursive approach to be effective, which is available in MySQL since version 8 and MariaDB since version 10.2.2 through the Common Table Expressions (CTE)

Example and Terminology

Have a look at that extraction of a linux file system:

To be more familiar with the modeling we need some terminlogy:

  • The top node marked as / is the root node
  • bin, etc, home, tmp, usr and var are children of the root node, their parent
  • Nodes that do not have children are called leaves
  • The path of a node is the way up, following the parents until it reaches the root node, for example the path of project1 would be project1 > www > var -> /
  • The level of a node is the length of its path to the root, root itself has level 0, its children level 1 and so on. The level of project1 is 3.
  • The siblings of a node are all nodes that have the same parent.

Structure

To create the adjaceny model in SQL we will set following table structure with the columns id, parent_id and title.

CREATE TABLE tree_adj (
  id int(10) unsigned NOT NULL AUTO_INCREMENT,
  parent_id int(10) unsigned DEFAULT NULL,
  title varchar(255) NOT NULL,
  PRIMARY KEY (id),
  FOREIGN KEY (parent_id) REFERENCES tree_adj (id) 
    ON DELETE CASCADE ON UPDATE CASCADE
);

Each row corresponds to one node in the tree. The parent_id is an internal foreign key and acts like a pointer to its parent, defined by id. With the contraints we make sure that deleting a node would delete all its children and grandchildren.

Inserting Data:

INSERT INTO `tree_adj` (`id`, `parent_id`, `title`) VALUES
	(1, NULL, ''),
	(2, 1, 'bin'),
	(3, 1, 'etc'),
	(4, 1, 'home'),
	(5, 1, 'tmp'),
	(6, 1, 'usr'),
	(7, 1, 'var'),
	(8, 3, 'apache2'),
	(9, 3, 'apt'),
	(10, 3, 'mysql'),
	(11, 8, 'conf-available'),
	(12, 8, 'mods-available'),
	(13, 8, 'sites-available'),
	(14, 4, 'roland'),
	(15, 4, 'albert'),
	(16, 6, 'bin'),
	(17, 6, 'src'),
	(18, 6, 'share'),
	(19, 7, 'backups'),
	(20, 7, 'cache'),
	(21, 7, 'www'),
	(22, 21, 'project1'),
	(23, 22, 'app'),
	(24, 22, 'log'),
	(25, 22, 'vendor'),
	(26, 22, 'public'),
	(27, 25, 'bin'),
	(28, 25, 'composer');

Finding Root Node

By definition a tree can only have one root node, otherwise it would be called a forrest. Our structure allows both trees and forrests. In general we can find the root node(s) like this

SELECT id, parent_id, title 
FROM tree_adj
WHERE parent_id IS NULL

In our example we only have one root node we can also address directly by its id

SELECT id, parent_id, title 
FROM tree_adj
WHERE id = 1 -- the root node id

Finding immediate Children of a Node

/* children for node */
SELECT id, parent, title
FROM tree_adj
WHERE parent_id = 1 -- the tree node
ORDER BY title;

Finding Leave Nodes

As we have learned, leave nodes are the nodes that have no children, so the query could look like that

/* leave nodes */
SELECT t.id, t.parent_id, t.title
FROM tree_adj t
WHERE NOT EXISTS (SELECT 1 FROM tree_adj where parent_id = t.id);

Retrieve whole Tree or Subtree

With the introduction of Common Table Expressions (CTE) reading the complete tree has become easy since you can apply recursion. Before it was difficult since you would need some a priori data about the maximal depth of the tree and use several left joins on the same tree table.

WITH RECURSIVE tree_path (id, parent_id, title, path) AS
(
  SELECT id, parent_id, title as title, CONCAT(title, '/') as path
    FROM tree_adj
    WHERE id = 1 -- the tree node
  UNION ALL
  SELECT t.id, t.parent_id, t.title, CONCAT(tp.path, t.title, '/')
    FROM tree_path AS tp JOIN tree_adj AS t
      ON tp.id = t.parent_id
)
SELECT * FROM tree_path
ORDER BY path; 

Retrieve Path for a Node

WITH RECURSIVE tree_path (id, parent_id, title, path) AS
(
  SELECT id, parent_id, title, CONCAT(title, '/') as path
    FROM tree_adj
    WHERE id = 27
  UNION ALL
  SELECT t.id, t.parent_id, t.title, CONCAT(t.title, '/', tp.path) as path
    FROM tree_path AS tp JOIN tree_adj AS t
      ON tp.parent_id = t.id
)
SELECT * FROM tree_path;

Calculating the Depth for all Nodes

WITH RECURSIVE tree_path (id, parent_id, title, lvl) AS
(
  SELECT id, parent_id, title, 0 lvl
    FROM tree_adj
    WHERE parent_id IS NULL
  UNION ALL
  SELECT t.id, t.parent_id, t.title, tp.lvl + 1
    FROM tree_path AS tp JOIN tree_adj AS t
      ON tp.id = t.parent_id
)
SELECT * FROM tree_path
ORDER BY lvl;

Deleting Nodes

a) Delete Node and all its Descendants

To delete a node and all his descendants, just remove the node, the complete subtree underneath will automatically be removed by the DELETE CASCADE of the foreign key constraint. To completely delete the folder /var/www/project1 we can run:

DELETE FROM tree_adj 
WHERE id = 22;

b) Only allow deleting Leave Nodes

Sometimes you do not want to delete all descendants when deleting a node. One possible strategy could be that we do not allow deleting nodes that have children.

This we can achieve by changing the foreign key constraint from DELETE CASCADE to DELETE RESTRICT.

In case your contraint has the name “tree_adj_ibfk_1” the query is

ALTER TABLE `tree_adj`
	DROP FOREIGN KEY `tree_adj_ibfk_1`;
ALTER TABLE `tree_adj`
	ADD CONSTRAINT `tree_adj_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `tree_adj` (`id`) ON UPDATE RESTRICT ON DELETE RESTRICT;

c) Delete Node and promote its Descendants

Before we delete a node, we need to know where its immediate children should go and assign them a new parent. So we need two operations to be executed:

  1. Update the parent_id of all its immediate children with the id of the new parent
  2. Then, delete the node.

Let’s delete again the directory /var/www/project1/ (id = 22) but this time we will move all its children to /tmp/, so new parent_id shall be 5.

We put all operations in one transaction:

BEGIN;

UPDATE tree_adj
SET parent_id = 5 -- new parent
WHERE parent_id = 22;

DELETE FROM tree_adj
WHERE id = 22; -- node to delete

COMMIT;

Moving Nodes or Subtrees

Moving nodes or subtrees in the adjacency list is pretty straight-forward. It does not depend if node is root of a subtree or it is a leaf node.
Just need to update the parent_id of the top node of the subtree.

Lets move “/var/www/project1/” to “/home/albert/“.

UPDATE tree_adj
SET parent_id = 15
WHERE id = 22

Get Siblings for a Node

Sometime we need to know the siblings of a node, that means all nodes that have the same parent as the node itself. If we inlucde the node itself in the result depends on the requirement of the application. We include the node in the list and add a separate column active (1 means it’s the node itself, 0 else)

Let’s do that for the “/tmp/” folder:

SELECT t2.id, t2.parent_id, t2.title, IF(t1.id = t2.id,1, 0) as `active` 
FROM tree_adj t1
INNER JOIN tree_adj t2 ON t2.parent_id = t1.parent_id
WHERE t1.id = 5
ORDER BY t2.title

Summary

We have learned about managing trees in MySQL using the Adjacency List Model.
This model uses a single table with self referencing foreign key.
It is fast and easy when modifying the structure.
There have been problems in reading the whole tree or subtrees, but with the Recursive Common Table Expressions introduced in later versions of MySQL/MariaDB this has been solved
It is one of the most popular models to represent trees