May 142010
 

The Problem

PostgreSQL despite all of its functionality seems to be missing an “unpivot” or “uncrosstab” operation which can be a real pain if you are stuck with a table you need to normalize. Lets say you have a de-normalized table like so…

clientID    clientName           contact1    contact2    contact3    contact4
-------------------------------------------------------------------------------
1           ABC Corp             1           34          2           NULL
2           DEF Foundation       6           2           8           9
3           GHI Inc.             5           9           NULL        NULL
4           XYZ Industries       24          NULL        6           NULL

…and you needed to normalize it so it looks like this…

clientID    ContactNumber    ContactID
----------------------------------------
1           contact1         1
1           contact2         34
1           contact3         2
2           contact1         6
2           contact2         2
2           contact3         8
2           contact4         9
3           contact1         5
3           contact2         9
4           contact1         24
4           contact3         6

You’d be pretty much be out of luck.

A (less than perfect) solution

The solution I’m proposing here is far from optimized or generic but it should be enough to get you over a lack of an unpivot function. Consider it a building block for you to go out and make a better one (for our purposes the un-pivoting performance is NOT an issue, it may not be the same in your use case).

How does it work?

Quite simply actually, PostgreSQL allows us to query a list of columns for any given table. If we cross join a list of columns names against the rows of the original table we end up with the original table rows repeated for every column name. The only thing missing is how do we then inject the row/column value into the resulting cross join?

The answer is a little bit of black magic and a horrible function that takes a name of a table, row ID and column name and returns the actual cell value. This function is then put into the aforementioned cross join and you are 90% done.

At this point you have something looking very similar to the second table except that its full of NULL values, at this point its a simple WHERE clause to get rid of the NULL values.

The code itself…

Well firstly we need to create the horrible black magic function to fetch us any given cell, I’ve opted to hardcode the return value as all of my pivot columns are double precision. The function just constructs an SQL statment, executes it and returns the result.

CREATE OR REPLACE FUNCTION get_column_value(text, text, text, text)
    RETURNS double precision AS
    $BODY$
    DECLARE
        val double precision := null;
        rRec  RECORD;
    BEGIN
        FOR rRec IN EXECUTE('SELECT ' || quote_ident($1) || ' AS val FROM ' || quote_ident($2) || ' WHERE ' || quote_ident($3) || ' = ' || quote_literal($4)) LOOP
            val = rRec.val;
        END LOOP;
        RETURN val;
    END;
$BODY$
LANGUAGE 'plpgsql';

Next we’ll create a nice little query to pull out a list of column names for a given table (I’ve hardcoded the table name and columns to fetch as string literals).

SELECT a.attnum, a.attname AS column_name
FROM pg_class c, pg_attribute a, pg_type t
WHERE c.relname = 'YourTableNameHere'
    AND a.attnum > 0
    AND a.attrelid = c.oid
    AND a.atttypid = t.oid
    AND (a.attname = 'YourColumnName1'
        OR a.attname = 'YourColumnName2'
        OR a.attname = 'YourColumnName3')

Then we’ll cross join the results of the column query with the actual columns you want from the table

SELECT my_table."YourIDColumnHere", my_columns.column_name, get_column_value(my_columns.column_name, 'YourTableNameHere', 'YourIDColumnHere', my_table."YourIDColumnHere") AS value
FROM "YourTableNameHere" my_table
CROSS JOIN
    (SELECT a.attnum, a.attname AS column_name
     FROM pg_class c, pg_attribute a, pg_type t
     WHERE c.relname = 'YourTableNameHere'
         AND a.attnum > 0
         AND a.attrelid = c.oid
         AND a.atttypid = t.oid
         AND (a.attname = 'YourColumnName1'
             OR a.attname = 'YourColumnName2'
             OR a.attname = 'YourCoumnName3')) my_columns

Then we’ll pull it all together and add a nice little filter to get rid of the NULL entries.

SELECT *
FROM
    (SELECT my_table."YourIDColumnHere", my_columns.column_name, get_column_value(my_columns.column_name, 'YourTableNameHere', 'YourIDColumnHere', my_table."YourIDColumnHere") AS value
     FROM "YourTableNameHere" my_table
     CROSS JOIN
         (SELECT a.attnum, a.attname AS column_name
          FROM pg_class c, pg_attribute a, pg_type t
          WHERE c.relname = 'YourTableNameHere'
              AND a.attnum > 0
              AND a.attrelid = c.oid
              AND a.atttypid = t.oid
              AND (a.attname = 'YourColumnName1'
                  OR a.attname = 'YourColumnName2'
                  OR a.attname = 'YourCoumnName3')) my_columns) joined_table
WHERE value IS NOT NULL

Getting it working for you…

At this point you should be aware of how we did it, all that is left for you to do is actually modify it for your needs (I’m afraid it’s not quite solving the generic case at the moment).

  1. Replace all instances of “YourColumnNameX”, “YourTableNameHere” etc
  2. Modify the get_column_value function to actually match the type of the column you are un-pivoting. (This example only supports a single type for all the columns)

Happy Normalizing!