U-labels, NFC, and symmetry
vint at google.com
Fri Apr 8 11:50:22 CEST 2011
Peter, what is it about NFC that makes it unsuitable for use with XMPP?
On Thu, Apr 7, 2011 at 5:59 PM, Peter Saint-Andre <stpeter at stpeter.im> wrote:
> RFC 5890 states:
> o A "U-label" is an IDNA-valid string of Unicode characters, in
> Normalization Form C (NFC) and including at least one non-ASCII
> character, expressed in a standard Unicode Encoding Form (such as
> UTF-8). It is also subject to the constraints about permitted
> characters that are specified in Section 4.2 of the Protocol
> document and the rules in the Sections 2 and 3 of the Tables
> document, the Bidi constraints in that document if it contains any
> character from scripts that are written right to left, and the
> symmetry constraint described immediately below. Conversions
> between U-labels and A-labels are performed according to the
> "Punycode" specification [RFC3492], adding or removing the ACE
> prefix as needed.
> To be valid, U-labels and A-labels must obey an important symmetry
> constraint. While that constraint may be tested in any of several
> ways, an A-label A1 must be capable of being produced by conversion
> from a U-label U1, and that U-label U1 must be capable of being
> produced by conversion from A-label A1. Among other things, this
> implies that both U-labels and A-labels must be strings in Unicode
> NFC [Unicode-UAX15] normalized form. These strings MUST contain only
> characters specified elsewhere in this document series, and only in
> the contexts indicated as appropriate.
> I'm updating the i18n handling in XMPP, and the XMPP community would
> like to use NFD on the wire for various reasons. Ideally we would like
> to do so without requiring a trip through NFC. However, it appears that
> we can do this only by using a term other than U-label, since that is
> tied to NFC. Indeed, it seems that a string in Unicode NFD normalized
> form is not an IDN label at all. This strikes me as unfortunate (I
> thought that normalization was handled only in RFC 5895 along with other
> such mapping issues), but probably because I do not understand how the
> symmetry requirement expressed in RFC 5890 necessitates the use of NFC.
> Would any of the i18n experts on this list care to enlighten me on the
> latter point?
> In the meantime, I shall pursue a way to specify XMPP domainparts
> independently of the term U-label.
> Peter Saint-Andre
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update