U-labels, NFC, and symmetry

Vint Cerf vint at google.com
Fri Apr 8 11:50:22 CEST 2011

Peter, what is it about NFC that makes it unsuitable for use with XMPP?


On Thu, Apr 7, 2011 at 5:59 PM, Peter Saint-Andre <stpeter at stpeter.im> wrote:
> RFC 5890 states:
>   o  A "U-label" is an IDNA-valid string of Unicode characters, in
>      Normalization Form C (NFC) and including at least one non-ASCII
>      character, expressed in a standard Unicode Encoding Form (such as
>      UTF-8).  It is also subject to the constraints about permitted
>      characters that are specified in Section 4.2 of the Protocol
>      document and the rules in the Sections 2 and 3 of the Tables
>      document, the Bidi constraints in that document if it contains any
>      character from scripts that are written right to left, and the
>      symmetry constraint described immediately below.  Conversions
>      between U-labels and A-labels are performed according to the
>      "Punycode" specification [RFC3492], adding or removing the ACE
>      prefix as needed.
>   To be valid, U-labels and A-labels must obey an important symmetry
>   constraint.  While that constraint may be tested in any of several
>   ways, an A-label A1 must be capable of being produced by conversion
>   from a U-label U1, and that U-label U1 must be capable of being
>   produced by conversion from A-label A1.  Among other things, this
>   implies that both U-labels and A-labels must be strings in Unicode
>   NFC [Unicode-UAX15] normalized form.  These strings MUST contain only
>   characters specified elsewhere in this document series, and only in
>   the contexts indicated as appropriate.
> I'm updating the i18n handling in XMPP, and the XMPP community would
> like to use NFD on the wire for various reasons. Ideally we would like
> to do so without requiring a trip through NFC. However, it appears that
> we can do this only by using a term other than U-label, since that is
> tied to NFC. Indeed, it seems that a string in Unicode NFD normalized
> form is not an IDN label at all. This strikes me as unfortunate (I
> thought that normalization was handled only in RFC 5895 along with other
> such mapping issues), but probably because I do not understand how the
> symmetry requirement expressed in RFC 5890 necessitates the use of NFC.
> Would any of the i18n experts on this list care to enlighten me on the
> latter point?
> In the meantime, I shall pursue a way to specify XMPP domainparts
> independently of the term U-label.
> Peter
> --
> Peter Saint-Andre
> https://stpeter.im/
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update

More information about the Idna-update mailing list