A friend of mine asked me today if there is any easy way to transform a table / query result that is quite tabular like the one in the table below to a hierarchical XML format just by using T-SQL query.
| Id | Product | Activity |
| 1 | Product A | Activity A |
| 2 | Product A | Activity B |
| 3 | Product B | Activity C |
| 4 | Product C | Activity D |
Fig 1. ProductMapping Table
<products>
<product>
<desc>Product A</desc>
<activities>
<activity>Activity A</activity>
<activity>Activity B</activity>
<activities>
</product>
<product>
<desc>Product B</desc>
<activities>
<activity>Activity C</activity>
<activities>
</product>
<product>
<desc>Product C</desc>
<activities>
<activity>Activity D</activity>
<activities>
</product>
</products>
Fig 2. Expected XML Result
Although I dabble in XML, XPATH, XSLT and the like from .NET / JavaScript; I've never really touched SQL Server XML features that much so I decided to play around a little with it.
The only thing I remember about T-SQL and XML is that they have this FOR XML keyword that will allow you to transform table data to XML ... somehow.
After a bit of researching and trial and error, I came up with the following T-SQL Query:
SELECT
DISTINCT product AS "desc",
(
SELECT
activity AS "*"
FROM
ProductMapping m2
WHERE
m2.product = m1.product
FOR XML PATH('activity'), ROOT('activities') )
FROM
ProductMapping m1
FOR XML PATH('product'), ROOT('products')
Fig 3. First T-SQL attempt for transforming tabular data to XML
This gets me pretty close to what I want, but for some reasons the subquery result is HTML encoded.
<products>
<product>
<desc>Product A</desc>
<activities>
<activity>Activity A</activity>
<activity>Activity B</activity>
</activities>
</product>
<product>
<desc>Product B</desc>
<activities>
<activity>Activity C</activity>
</activities>
</product>
<product>
<desc>Product C</desc>
<activities>
<activity>Activity D</activity>
</activities>
</product>
</products>
Fig 4. Bad XML!!
After researching a little bit more and could not find a satisfactory and clean answer to do this, I finally just gave up and wrapped the above T-SQL in another SELECT statement with a bunch of replace functions surrounding it to transform all the < and > to actual < and >.
SELECT
replace(
replace(
(
SELECT
DISTINCT product AS "desc",
(
SELECT
activity AS "*"
FROM
ProductMapping m2
WHERE
m2.product = m1.product
FOR XML PATH('activity'), ROOT('activities')) FROM
ProductMapping m1
FOR XML PATH('product'), ROOT('products') ),
N'<', N'<'),
N'>', N'>') AS [Products XML]
Fig 5. This T-SQL gets me what I want but it's not really elegant.
Anyone know a better T-SQL only way to do this?