U-labels, NFC, and symmetry
Peter Saint-Andre
stpeter at stpeter.im
Thu Apr 7 23:59:05 CEST 2011
RFC 5890 states:
o A "U-label" is an IDNA-valid string of Unicode characters, in
Normalization Form C (NFC) and including at least one non-ASCII
character, expressed in a standard Unicode Encoding Form (such as
UTF-8). It is also subject to the constraints about permitted
characters that are specified in Section 4.2 of the Protocol
document and the rules in the Sections 2 and 3 of the Tables
document, the Bidi constraints in that document if it contains any
character from scripts that are written right to left, and the
symmetry constraint described immediately below. Conversions
between U-labels and A-labels are performed according to the
"Punycode" specification [RFC3492], adding or removing the ACE
prefix as needed.
To be valid, U-labels and A-labels must obey an important symmetry
constraint. While that constraint may be tested in any of several
ways, an A-label A1 must be capable of being produced by conversion
from a U-label U1, and that U-label U1 must be capable of being
produced by conversion from A-label A1. Among other things, this
implies that both U-labels and A-labels must be strings in Unicode
NFC [Unicode-UAX15] normalized form. These strings MUST contain only
characters specified elsewhere in this document series, and only in
the contexts indicated as appropriate.
I'm updating the i18n handling in XMPP, and the XMPP community would
like to use NFD on the wire for various reasons. Ideally we would like
to do so without requiring a trip through NFC. However, it appears that
we can do this only by using a term other than U-label, since that is
tied to NFC. Indeed, it seems that a string in Unicode NFD normalized
form is not an IDN label at all. This strikes me as unfortunate (I
thought that normalization was handled only in RFC 5895 along with other
such mapping issues), but probably because I do not understand how the
symmetry requirement expressed in RFC 5890 necessitates the use of NFC.
Would any of the i18n experts on this list care to enlighten me on the
latter point?
In the meantime, I shall pursue a way to specify XMPP domainparts
independently of the term U-label.
Peter
--
Peter Saint-Andre
https://stpeter.im/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6105 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20110407/f978c4dc/attachment.bin>
More information about the Idna-update
mailing list