Please review application/shf+xml
Linus Walleij
triad at df.lth.se
Wed Oct 29 18:02:23 CET 2003
On Thu, 30 Oct 2003, MURATA Makoto wrote:
> > * We define that shf+xml will use UTF-8 and UTF-16 only, for reasons of
> > simplicty.
>
> Which UTF-16? Unfortunately, there are three charsets for UTF-16.
> They are "utf-16le", "utf-16be" and "utf-16" (see RFC 2781).
The XML specification says:
Entities encoded in UTF-16 must begin with the Byte Order Mark described
by Annex F of [ISO/IEC 10646], Annex H of [ISO/IEC 10646-2000], section
2.4 of [Unicode], and section 2.7 of [Unicode3] (the ZERO WIDTH NO-BREAK
SPACE character, #xFEFF). This is an encoding signature, not part of
either the markup or the character data of the XML document. XML
processors must be able to use this character to differentiate between
UTF-8 and UTF-16 encoded documents.
As easy as it gets :-)
> Since this XML format describes hexadecimal data, almost every character
> is US-ASCII. I wonder why we have to double the file size by representing
> a US-ASCII character with 16 bits. 1MB in UTF-8 becomes 2MB in UTF-16.
That's a good point. OK it's UTF-8 only then.
Linus
More information about the Ietf-types
mailing list