Lookup & NFC

Vint Cerf vint at google.com
Fri Mar 28 07:25:40 CET 2008

With regard to the idna200x spec would it be important in, say, the rationale document to make clear that the spec applies to strings presented for registration and/or lookup and that it is assumed that they are expected to be presented such that they meet the allowed character criteria? This seems to be related to mark davis' point about preprocessing. Alternatively an advisory if not normative document might offer advice about the need to examine and possibly alter strings intended to be registered or looked up in DNS to meet specific criteria? We know the strings may be obtained in a variety of ways but they must ultimately be conformant to certain criteria to be accepted into the DNS for registration or looked up. There is a distinction made between "protocol valid for registration" and "protocol valid for lookup" and that distinction needs to be clear also. V

----- Original Message -----
From: idna-update-bounces at alvestrand.no <idna-update-bounces at alvestrand.no>
To: John C Klensin <klensin at jck.com>; Shawn Steele <Shawn.Steele at microsoft.com>; idna-update at alvestrand.no <idna-update at alvestrand.no>
Sent: Thu Mar 27 22:22:01 2008
Subject: RE: Lookup & NFC

At 03:10 08/03/28, John C Klensin wrote:

>The more important answer is that the intent of the spec is "if
>you need this mapping, it is your job to apply it before you
>invoke IDNA".  Taking NFC as an example, let's assume we have
>two operating systems,
>       * One of them gets strings into NFC form as soon as they
>       are typed and verifies that (and corrects them if
>       necessary) any time they are loaded or otherwise
>       examined.
>       * The other lets users type strings and carried them
>       around in whatever form they are typed, presumably
>       unnormalized.

This is in essence correct, but it implies that things mainly
depend on how the user types them. This is very much NOT the
case. Whether the user types some accents with modifier keys
(in some cases called dead keys), some shift combination, or
a predefined key for that accented character is independent
of whether these characters enter the system as precomposed
or depomposed. Microsoft and most Unix/Linux systems use
precomposed characters, so that's what an application gets
from the keyboard driver and related machinery. The Mac
uses decomposed, so there, that's what you get.

Also, as far as I know (the Mac may be an exception), the
data is not usually normalized or checked for normalization.
In general, that is not necesary, because the keyboard driver
already takes care of this. But if the user e.g. enters
some non-normalized characters from a character picker or so,
then these enter the datastream as they are, unchecked.

Regards,    Martin.

>For the first, it would clearly be silly for the internals of an
>IDNA implementation to spend energy converting to NFC (although
>as Mark has pointed out, I think on this list, the check that
>the string is in NFC form is sufficiently simple and quick that
>one might make it in the name of robustness).  For the second,
>the spec requires that the application get the string into NFC
>form before looking it up, but one would assume that would be
>fairly natural.  As you implicitly point out, lookups of any
>form would generally be expected to fail unless normalized
>strings were compared to normalized strings.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     

Idna-update mailing list
Idna-update at alvestrand.no

More information about the Idna-update mailing list