Maintaining cached data with triggers in PostgreSQL

(This was written for a corporate blog for GBdirect, which in the end I decided not to deploy. But this might be useful for other things, so I'll publish it here on my personal blog instead.)

I'm still writing the software to implement this blog, and I found myself descending into a nest of nasty SQL. The problem is maintaining information about which sections have articles in. I want to have each article in a single ‘primary’ section, and also optionally in some ‘secondary’ sections (so that I can have articles automatically linked to the training courses we offer in relevant subject areas, for example). The thing that makes this tricky is that I don't want a section to appear on the website until someone's written at least one article to appear in it.

When I tried to write SQL to get a list of sections which have articles in them (either as primary or secondary section) it ended up with a whole load of GROUP-BYs and UNIONs, and wasn't coming out with the right results. There's probably a relatively simple way to do what I want with SQL, but I can't see it yet. So I decided to keep a count of the number of articles in each section. That makes it simple to ignore sections that haven't been used yet. All I need is to make sure that whenever an article's primary or secondary sections are updated, the count gets updated accordingly. I was planning to do that by hand, until a friend suggested I try using triggers.

Triggers in PostgreSQL

A trigger is just some code that gets run whenever a new record is inserted into a table, or when an existing record is updated or deleted. Postgres allows triggers to be written in various languages, but it looks like the simplest way to go about this is to write them in PL/pgSQL, which is basically just SQL statements with the addition of conditional structures and loops. Triggers can be used to test that new records are valid (if the normal constraints features aren't powerful enough, which they usually are) or to update other bits of information that track them. The nice thing about it is that once the triggers are set up, PostgreSQL keeps everything in check for you, so there are no tricky bugs where you forgot to update a redundant value in an unusual situation.

For some reason you have to tell Postgres that you want to use a particular language to write functions before you do it, using the createlang program. This has to be done for each database separately. The program comes with Postgres, but might not be in your $PATH by default, so for example on Debian (unstable) I used this to enable PL/pgSQL for the blog database:

/usr/lib/postgresql/bin/createlang plpgsql gbdirect_blog

That done, you can declare functions. Here are my two functions for updating my section table, one for when an article is added to a section, and one for when it's removed from it:

create function trig_add_to_section () returns trigger AS '
    begin
        update section
        set num_articles = num_articles + 1
        where id = NEW.section_id;
        return NEW;
    end;
' language plpgsql;

create function trig_del_from_section () returns trigger AS '
    begin
        update section
        set num_articles = num_articles - 1
        where id = OLD.section_id;
        return OLD;
    end;
' language plpgsql;

Trigger functions don't take any arguments, and have to return a special trigger value. The OLD and NEW values are automatically set up with copies of the record before and/or after the update. So for example, an UPDATE operation provides both values with copies of the record as it appeared before the update and as it will be afterwards. A DELETE operation would only provide OLD, because there will be no record after the operation. INSERT operations only provide NEW.

To plug these in as triggers, so that they get run automatically whenever the tables are changed, I did this for the article table, which has a section_id column to indicate the primary section:

create trigger trig_article_insert after insert on article
    for each row execute procedure trig_add_to_section();
create trigger trig_article_delete after delete on article
    for each row execute procedure trig_del_from_section();
create trigger trig_article_update_del after update on article
    for each row execute procedure trig_del_from_section();
create trigger trig_article_update_add after update on article
    for each row execute procedure trig_add_to_section();

So if a new article is inserted, the count for it's section gets incremented by one, and so on. An UPDATE operation triggers calls to both functions, first decrementing the count for the old section and then incrementing it for the new one.

The AFTER keyword means that the triggers are run after the update has occurred. You can also use BEFORE, but I'm not sure what difference it makes. This works for me.

Trigger functions can be polymorphic

There are actually two tables I need to watch to keep my article counts up to date: the article table, which records an article's primary section, and the article_section table, which might list additional secondary sections. Fortunately, the column name I'm interested in is called section_id in both cases, and the updates to the section table are the same, so I was able to use the same two trigger functions.

Here's the SQL to set up triggers for the second table, which you'll see is the same as above apart from the table name:

create trigger trig_article_insert after insert on article_section
    for each row execute procedure trig_add_to_section();
create trigger trig_article_delete after delete on article_section
    for each row execute procedure trig_del_from_section();
create trigger trig_article_update_del after update on article_section
    for each row execute procedure trig_del_from_section();
create trigger trig_article_update_add after update on article_section
    for each row execute procedure trig_add_to_section();

If you look at the trig_add_to_section and trig_del_from_section functions above, you'll see that they don't mention the table name at all. As long as the trigger is being fired on a table with a section_id column, they should just work.

I was expecting there to be a problem with this. According to the Postgres documentation, the first time some SQL is run in a PL/pgSQL function it gets it's query plan compiled and cached, so if you call it again using a different table it will break. This didn't seem to cause any problems for me. I'm guessing it's only a problem with SELECT queries, which I didn't need. If you find this a problem then it looks like the solution is to use the PL/pgSQL EXECUTE command, giving it the query as a string. That should force Postgres to recompile the query plan each time it is executed. It can also be used to dynamically choose the table and column names.

< Suggesting a filename for downloaded files | A first attempt at using the CLISP foreign function interface >

Miniblog

(nuggets of inanity)

Tuesday Apr 24th 2007, 16:54 »
Just took the annual web design survey that AListApart do. I don't realy consider myself to be a web designer, but I have been doing a lot of HTML and CSS lately.
Monday Apr 23rd 2007, 18:23 »
Strange, there appears to be a bare-knuckle boxing match going on in the field outside my flat. Wish they wouldn't make so much noise about it.
Thursday Mar 1st 2007, 18:47 »
“In its written form, Hebrew has no vowels, making it the ideal language for texting.”
—Said in jest on some Radio 4 programme just now.

Archive: 2007 · 2006 · 2005 · 2004
Feed