U-labels, NFC, and symmetry

John C Klensin klensin at jck.com
Fri Apr 15 18:15:26 CEST 2011



--On Friday, April 15, 2011 09:52 -0600 Peter Saint-Andre
<stpeter at stpeter.im> wrote:

>> If what you're saying is that you want a definition of
>> D-compatible U-label, I am not sure whether that is practical.
> 
> I (well, the XMPP folks) *might* want a D-compatible
> domaineything. I think we've already determined that such a
> thing would not be a U-label.

The conformance test for a D-compatible domaineything would be:

(1) Verify that it is in NFD form (otherwise, it isn't
D-compatible)
(2) Convert it to NFC
(3) Apply the operations of RFC 5891 to the rules specified in
5892 (or to tables derived from them) to verify that the result
of (2) is a U-label.

There isn't any practical way to apply 5891/5892 to an
NFD-conformant string without first converting it to NFC.  The
rules and algorithms just aren't constructed that way.

And, of course, converting an A-label to a U-label yields an
NFC-conformant string that you would then have to convert to an
NFD string for your purposes.  

That is not impossible.  Given your earlier analysis, it would
be a small marginal cost for a relatively rare case.   Keeping
things straight (remember what is in which form) either requires
that implementers be very careful --much more careful than if
everything were in the same form-- or a lot of testing to be
sure they had gotten it right, but possibly XMPP implementers
are significantly more careful on average than we usually see
with applications.

But, modulo a potential issue with characters newly-added to
Unicode, I still don't see the case for NFD: it certainly
doesn't make string comparisons any easier.

     john




More information about the Idna-update mailing list