August 2008 - Posts

Transforming Denormalized Tabular Data to Hierarchical XML Using SQL Server (2005)

A friend of mine asked me today if there is any easy way to transform a table / query result that is quite tabular like the one in the table below to a hierarchical XML format just by using T-SQL query.

Id Product Activity
1 Product A Activity A
2 Product A Activity B
3 Product B Activity C
4 Product C Activity D

Fig 1. ProductMapping Table

 

<products>
    <product>
        <desc>Product A</desc>
        <activities>
            <activity>Activity A</activity>
            <activity>Activity B</activity>
        <activities>
    </product>
    <product>
        <desc>Product B</desc>
        <activities>
            <activity>Activity C</activity>
        <activities>
    </product>
    <product>
        <desc>Product C</desc>
        <activities>
            <activity>Activity D</activity>
        <activities>
    </product>
</products>

Fig 2. Expected XML Result

 

Although I dabble in XML, XPATH, XSLT and the like from .NET / JavaScript; I've never really touched SQL Server XML features that much so I decided to play around a little with it.

The only thing I remember about T-SQL and XML is that they have this FOR XML keyword that will allow you to transform table data to XML ... somehow.

After a bit of researching and trial and error, I came up with the following T-SQL Query:

SELECT 
    DISTINCT product AS "desc",
    (
        SELECT 
            activity AS "*"
        FROM 
            ProductMapping m2
        WHERE 
            m2.product = m1.product
        FOR XML PATH('activity'), ROOT('activities')
    ) 
FROM
    ProductMapping m1 
FOR XML PATH('product'), ROOT('products')

Fig 3. First T-SQL attempt for transforming tabular data to XML

 

This gets me pretty close to what I want, but for some reasons the subquery result is HTML encoded.

<products>
    <product>
        <desc>Product A</desc>
        &lt;activities&gt;
            &lt;activity&gt;Activity A&lt;/activity&gt;
            &lt;activity&gt;Activity B&lt;/activity&gt;
        &lt;/activities&gt;
    </product>
    <product>
        <desc>Product B</desc>
        &lt;activities&gt;
            &lt;activity&gt;Activity C&lt;/activity&gt;
        &lt;/activities&gt;
    </product>
    <product>
        <desc>Product C</desc>
        &lt;activities&gt;
            &lt;activity&gt;Activity D&lt;/activity&gt;
        &lt;/activities&gt;
    </product>
</products>

Fig 4. Bad XML!!

 

After researching a little bit more and could not find a satisfactory and clean answer to do this, I finally just gave up and wrapped the above T-SQL in another SELECT statement with a bunch of replace functions surrounding it to transform all the &lt; and &gt; to actual < and >.

SELECT 
    replace(
    replace(
    (
        SELECT
            DISTINCT product AS "desc",
            (
                SELECT
                    activity AS "*"
                FROM
                    ProductMapping m2
                WHERE 
                    m2.product = m1.product
                FOR XML PATH('activity'), ROOT('activities')) 
        FROM
            ProductMapping m1 
        FOR XML PATH('product'), ROOT('products')
    ),
    N'&lt;', N'<'),
    N'&gt;', N'>') AS [Products XML]

Fig 5. This T-SQL gets me what I want but it's not really elegant.

 

Anyone know a better T-SQL only way to do this?

Share this post: | | | |
Posted by Jimmy Chandra | 2 comment(s)
Filed under: ,