Weblog
MySQL versus PostgreSQL: Adding a 'Last Modified Time' Column to a Table
This is the second post here detailing my ongoing adventures with PostgreSQL. This time I had a requirement to add a 'timestamp' column to a table. The point of this being to allow us to track the 'last modified' time of a row, without requiring that the application code manage the timestamp itself.
There's a lot of reasons why you might wish to do this. In this case it was to simplify syncing the data into a data warehouse. More specifically, to allow the DBA to easily identify rows which have changed since the last import.
Having done this a couple of times in MySQL, I assumed that the process would be straightforward. I should know better by now!
MySQL
MySQL provides a TIMESTAMP column type, for exactly this purpose. It's formatted as a standard DATETIME, but can optionally be configured to automatically set itself to the current time when the row is inserted (on by default) and/or updated (off by default). It's a breeze to add a column of this type to a table, like so:
ALTER TABLE mytable
ADD lastmodified TIMESTAMP
DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP;Notice the crucial ON UPDATE clause, which tells MySQL to update the timestamp when the row is modified.
All rows in the lastmodified column are initially set to the 'zero' date, i.e. 0000-00-00 00:00:00. If we want to backfill the column with another value we can do that in a further step. This example sets each row's timestamp to the current date and time:
UPDATE mytable SET lastmodified=CURRENT_TIMESTAMP
Pretty straightforward, all in all.
PostgreSQL
Let's dive right in and view the Postgres solution in all its glory:
ALTER TABLE mytable
ADD lastmodified TIMESTAMP;
ALTER TABLE mytable
ALTER COLUMN lastmodified
SET DEFAULT CURRENT_TIMESTAMP;
UPDATE mytable SET lastmodified=CURRENT_TIMESTAMP;
CREATE OR REPLACE FUNCTION update_lastmodified_column()
RETURNS TRIGGER AS '
BEGIN
NEW.lastmodified = NOW();
RETURN NEW;
END;
' language 'plpgsql';
CREATE TRIGGER update_lastmodified_modtime BEFORE UPDATE
ON mytable FOR EACH ROW EXECUTE PROCEDURE
update_lastmodified_column();
That's a fair deal more complex. The big obstacle here is that Postgres doesn't have the equivalent of MySQL's TIMESTAMP type. It does a have type named TIMESTAMP, but this is analogous to MySQL's DATETIME. We'll have to reverse engineer the behaviour we want.
We'll step though the process and have a closer look at what's involved. We'll start by creating the column and specifying that inserted rows should automatically default to the current date and time. As we've seen before, PostgreSQL won't allow us to do that in one step.
ALTER TABLE mytable
ADD lastmodified TIMESTAMP;
ALTER TABLE mytable
ALTER COLUMN lastmodified
SET DEFAULT CURRENT_TIMESTAMP;
Again, we may wish to backfill the column:
UPDATE mytable SET lastmodified=CURRENT_TIMESTAMP;
As an aside, I like that Postgres has some convenient shorthand values for commonly used dates. Alongside CURRENT_TIMESTAMP, there's such options as "yesterday", "epoch" and the cosmologically questionable "-infinity". They're documented here.
So far we've created the column, backfilled it with sane values, and any newly inserted rows will be timestamped appropriately. But what about updates to rows? This is where the fun really begins.
Lacking MySQL's ON UPDATE clause, which we noted earlier, we need to create a TRIGGER. A trigger is a kind of "event handler" which is fired off (well, triggered) in response to some action, in this case UPDATEs. To make things even more interesting, Postgres deviates from the SQL99 standard by not allowing us to run SQL directly within the trigger: any functionality we require must be defined in a stored proc:
CREATE OR REPLACE FUNCTION update_lastmodified_column()
RETURNS TRIGGER AS '
BEGIN
NEW.lastmodified = NOW();
RETURN NEW;
END;
' language 'plpgsql';
NEW is a special keyword which refers to the new version of the row. Similarly, OLD is available to us, should we need to access the previous values within the row.
Finally, we have to attach that proc to the relevant table by use of the trigger:
CREATE TRIGGER update_lastmodified_modtime BEFORE UPDATE ON mytable FOR EACH ROW EXECUTE PROCEDURE update_lastmodified_column();
MySQL's single command has succesfully been implemented as a dozen or so lines of Postgres code. I'll give round two to MySQL, but I'll give points to PostgreSQL for making my day a little more interesting than it might otherwise have been!
MySQL versus PostgreSQL: Adding an Auto-Increment Column to a Table
The bulk of my database experience (almost eight years now) has been with the popular open-source MySQL database management system. MySQL has progressed significantly over the years, and has grown into a remarkable product. It finally has all the must-have features such as views, stored procs and referential integrity, coupled with the blistering performance for which MySQL has always been known. In short, it rocks.
But I digress. I've recently been having to get to grips with PostgreSQL (an old version of course - 7.1 or so - just to make life really interesting). It's largely intuitive, but there are quirks around most corners. This is my favourite so far: I recently needed to add an auto-incrementing integer "id" column to a table.
MySQL
This sort of thing will be second nature to MySQL developers:
ALTER TABLE mytable
ADD myid INT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE;
One SQL command - not bad.
PostgreSQL
It turned out not to be so easy with our Postgres installation. For a start, there are no auto_increment columns. So, as with several other major RDBMSs, the solution is to create a "sequence", which is kind of like a pseudo-table which acts as a counter:
CREATE SEQUENCE mytable_myid_seq;
Next, we have to add our new column to the table, and specify that for each new row it should take its value from the sequence, using the NEXTVAL() function. For reasons best known to the Postgres guys, you can't do this in one step:
ALTER TABLE mytable
ADD myid INT UNIQUE;
And then:
ALTER TABLE mytable
ALTER COLUMN myid
SET DEFAULT NEXTVAL('mytable_myid_seq');
We're getting there. We now have an auto increment column. The problem is that Postgres won't backfill this with values automatically: all pre-existing rows are currently null for this column. Let's change that:
UPDATE mytable
SET
myid = NEXTVAL('mytable_myid_seq');
Job done. Well, some time later, the job will be done. That final step is one of the slowest things you can possibly ask Postgres to do. For a mid-sized table (around 5,000,000 rows, with a handful of small numeric and text columns), that took about 2.5 hours on powerful hardware - so you'll want to leave this for a quiet time. Fortunately Postgres treats the UPDATE as an atomic transaction: nothing is committed until the command completes, so it will be difficult for you to leave the data in an inconsistent state.
Are Relational Databases Obsolete?
I found this hilarious.
Michael Stonebraker, co-creator of the Ingres and Postgres database management systems, recently made a blog posting suggesting that column-oriented database engines (example: Vertica) might someday triumph over the traditional row-oriented engines (example: every other database engine you know) in some situations, perhaps most notably the data-warehousing field. It's an interesting post, and well worth reading.
Somebody else thought so. Computer World reported the story under the title:
Relational database pioneer says technology is obsolete.
Slightly sensationalist, but whatever. And so naturally the story turned up on Slashdot as:
Are Relational Databases Obsolete?
Did anybody read TFA? I doubt it.
The real shame is that, I suspect this is how most 'real' journalism happens too.