0

I am using PostgreSQL for modeling Tree data structures using adjacency list approach. And i would like to be able to store extra data on each leaf indicating to which group this tree belong:

CREATE TABLE groups (
  group_id integer PRIMARY KEY
);

CREATE TABLE leafs (
  leaf_id   integer PRIMARY KEY,
  parent_id integer REFERENCES leafs ON DELETE CASCADE ON UPDATE CASCADE,

  group_id integer REFERENCES groups ON DELETE CASCADE ON UPDATE CASCADE NOT NULL
);

I also would like to ensure that each leaf could only be connected to the same group. It looks like this could be done by either creating a TRIGGER or CHECK constraint. I have two questions:

  1. What is the most efficient/right way to handle this particular case, TRIGGER or CHECK constraint? (and what is rule of thumbs for choosing between these two)

  2. Is there is a better way to enforce consistency of this model (or perhaps alternative ways to model such groups of trees).

Thanks,

Code for TRIGGER version:

CREATE OR REPLACE FUNCTION after_leaf_update()
  RETURNS trigger AS
$$
BEGIN
IF (NEW.parent_id IS NOT NULL) AND (SELECT group_id FROM leafs WHERE leaf_id=NEW.parent_id) <> NEW.group_id THEN
    RAISE EXCEPTION 'group_id of node/leaf does not match!!!';
END IF;
RETURN NEW;
END;
$$
LANGUAGE 'plpgsql';


CREATE TRIGGER leafs_consistency_check
  BEFORE INSERT OR UPDATE
  ON leafs
  FOR EACH ROW
  EXECUTE PROCEDURE after_leaf_update();

Code for CHECK constraint version:

CREATE OR REPLACE FUNCTION leafs_consistency_check_constraint(prn_id integer, grp_id integer) RETURNS BOOL AS
$$
BEGIN
    IF (prn_id IS NOT NULL) AND (SELECT group_id FROM leafs WHERE leaf_id=prn_id) <> grp_id THEN
    RAISE EXCEPTION 'CHECK: group_id of node/leaf does not match!!!';
END IF;
RETURN TRUE;

END;
$$ LANGUAGE plpgsql;

ALTER TABLE leafs ADD constraint group_id_constraint check ( leafs_consistency_check_constraint(parent_id, group_id) );
Yatima
  • 119
  • 7
  • Did you check the [PostgreSQL ltree module](https://www.postgresql.org/docs/current/static/ltree.html)? Is it applicable? Can you do something like `CREATE TABLE test (path ltree, YOUR_LABEL_HERE text NOT NULL)` like in the `F.21.4. Example`? Enable the module for a database by using `CREATE EXTENSION ltree`. – flutter Jun 13 '17 at 23:47
  • Did you check the answer to this [SO question](https://stackoverflow.com/questions/14543173/postgresql-designing-a-tree-hierarchy-with-mixed-node-types-inheritance-does)? Does it help? – flutter Jun 13 '17 at 23:55
  • @flutter - thank you for the link! I was not aware of ltree module - i will check it out! – Yatima Jun 14 '17 at 21:07
  • @flutter re using inheritance: i have checked it out - does not seems to help in my case since `leafs` is all of the same type... – Yatima Jun 14 '17 at 21:41

1 Answers1

1

Use the trigger.

Generally speaking, Postgres devs don't recommend putting functions which run queries in CHECK constraints. Among other things, under some circumstances the CHECK constraint might get checked more than once for a query, and that's a pretty expensive check. More importantly, there's a lot of careful engineering around row visibility and similar considerations for Triggers which simply doesn't exist for Constraints.

I'll also recommend that you consider an alternate approach: instead of throwing an error, what if you use a BEFORE trigger, ignore the user input, and automatically give the new node the same group_id as its parent?

Or even better, since the group_ids are affiliated with the parents and not with the children, how about putting group_ids in a separate table which is linked just to the parent nodes?

FuzzyChef
  • 3,879
  • 2
  • 17
  • 17
  • thank you for the answer and for explaining rational behind it! Re 'alternative approach': yes, i did consider this but in case of this app attempts to insert node with wrong `group_id` is clear indication of something went horrible wrong (or attack on the system) so i think error is a more appropriate way. – Yatima Jun 14 '17 at 21:09
  • Re group_ids in a separate table: yes, i did consider this. I do agree that this way there is no data duplication and therefor no room for error. However in such schema following operations became considerably more expensive: [a] identifying all leafs in tree, [b] deleting the tree. Later is particularly troublesome since it is possible to form a `circular tree` deleting of which does not seems to be straightforward. I understand that [a] could be solved by using recursive queries so my only concern is deal with [b]. Any advice/thoughts on this? Thanks, – Yatima Jun 14 '17 at 21:16