SQL Server 2012 XQuery Supplementary Characters Support

From SQL Server 2012, XQuery functions are surrogate-aware. SQL Server Unicode data types, including NCHAR, NVARCHAR, and XML, encode text in UTF-16 format. SQL Server allocates for each character a unique codepoint, a value in the range 0x0000 to 0x10FFFF. Most of the characters fit into a 16-bit word. Characters with codepoint values larger than 0xFFFF require two consecutive 16-bit words, i.e. two bytes. These characters are called supplementary characters, and the two consecutive 16-bit words are called surrogate pairs.

The standard W3C recommendation for XQuery functions and operators requires them to count a surrogate pair as a single character. In SQL Server versions prior to 2012, XQuery string functions did not recognize surrogate pairs as a single character. For example, string length calculations returned incorrect results.

The XML data type in SQL Server only allows well-formed surrogate pairs. However, it is possible to pass invalid or partial surrogate pairs to XQuery functions as string values. Below is an example of providing surrogate pairs to XML data type variable through string values. The value of the CustomerName element is "𠆾𠇀𠇃12". First three characters of the name are surrogate pairs. The length of this value is 5. However, in versions SQL Server before SQL 2012, you would get length 8.

DECLARE @x AS NVARCHAR(MAX), @y AS XML;

SET @x=

N'<C11003>

<CustomerKey>11003</CustomerKey>

<CustomerName>&#131518;&#131520;&#131523;12</CustomerName>

<SalesAmount>2294.9900</SalesAmount>

</C11003>';

SET @y = CAST(@x AS XML);

SELECT @y.query('

for $i in /C11003

return

<NumberOfCharacters>

         { string-length($i/CustomerName[1]) }

</NumberOfCharacters>

');

If you execute this code in SQL Server versions 2005, 2008 or 2008 R2, you get the following result:

<NumberOfCharacters>8</NumberOfCharacters>

The same code executed in SQL Server 2012 returns:

<NumberOfCharacters>5</NumberOfCharacters>

This changed behavior affects the following XQuery functions, operators and clauses:

  • fn:string-length
  • fn:substring
  • fn:contains (only when the compatibility level is 110 or higher)
  • fn:concat (only when the compatibility level is 110 or higher)
  • Comparison operators including +, <, >, <=, >=, eq, lt, gt, le, and ge (only when the compatibility level is 110 or higher)
  • order by clause (only when the compatibility level is 110 or higher).
Avtor: Anonymous, objavljeno na portalu SloDug.si (Arhiv)

Leave a comment

Please note that we won't show your email to others, or use it for sending unwanted emails. We will only use it to render your Gravatar image and to validate you as a real person.