U-labels, NFC, and symmetry

Peter Saint-Andre stpeter at stpeter.im
Thu Apr 7 23:59:05 CEST 2011


RFC 5890 states:

   o  A "U-label" is an IDNA-valid string of Unicode characters, in
      Normalization Form C (NFC) and including at least one non-ASCII
      character, expressed in a standard Unicode Encoding Form (such as
      UTF-8).  It is also subject to the constraints about permitted
      characters that are specified in Section 4.2 of the Protocol
      document and the rules in the Sections 2 and 3 of the Tables
      document, the Bidi constraints in that document if it contains any
      character from scripts that are written right to left, and the
      symmetry constraint described immediately below.  Conversions
      between U-labels and A-labels are performed according to the
      "Punycode" specification [RFC3492], adding or removing the ACE
      prefix as needed.

   To be valid, U-labels and A-labels must obey an important symmetry
   constraint.  While that constraint may be tested in any of several
   ways, an A-label A1 must be capable of being produced by conversion
   from a U-label U1, and that U-label U1 must be capable of being
   produced by conversion from A-label A1.  Among other things, this
   implies that both U-labels and A-labels must be strings in Unicode
   NFC [Unicode-UAX15] normalized form.  These strings MUST contain only
   characters specified elsewhere in this document series, and only in
   the contexts indicated as appropriate.

I'm updating the i18n handling in XMPP, and the XMPP community would
like to use NFD on the wire for various reasons. Ideally we would like
to do so without requiring a trip through NFC. However, it appears that
we can do this only by using a term other than U-label, since that is
tied to NFC. Indeed, it seems that a string in Unicode NFD normalized
form is not an IDN label at all. This strikes me as unfortunate (I
thought that normalization was handled only in RFC 5895 along with other
such mapping issues), but probably because I do not understand how the
symmetry requirement expressed in RFC 5890 necessitates the use of NFC.
Would any of the i18n experts on this list care to enlighten me on the
latter point?

In the meantime, I shall pursue a way to specify XMPP domainparts
independently of the term U-label.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6105 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20110407/f978c4dc/attachment.bin>


More information about the Idna-update mailing list