Mixing scripts (Re: Unicode versions (Re: Criteria for exceptional characters))

Erik van der Poel erikv at google.com
Sat Dec 23 18:15:30 CET 2006


John,

While it is clear that we are not going to agree about putting general
script mixing rules in the IDNA200x RFCs, I'd like to point out that
there are other ways to write such rules or recommendations (other
than the way you mention below).

Somewhat more importantly, I have a suggestion below regarding the use
of the word "protocol" in our email discussions and, especially, in
the IDNA RFCs.

On 12/22/06, John C Klensin <klensin at jck.com> wrote:
> As one example, trying to apply a "no mixed script" rule, at the
> protocol level, across all of the labels of an FQDN is simply
> infeasible, even if it were wise.  To do so would require rather
> fundamental changes in the way DNS name-resolution works,
> including script recognition in the resolvers and during DNAME
> synthesis, which was exactly the problem IDNA is intended to
> avoid.

While the DNS infrastructure is one area where script mixing rules
might conceivably be applied, there are other pieces of software where
one could possibly apply some rules. For example, in your own Internet
Draft for IDNA200x, you mention "label rejection" in the context of
"user input", presumably with the help of a user agent. See section
2.2.3 in:

http://www.ietf.org/internet-drafts/draft-klensin-idnabis-issues-00.txt

Even section 1.3 of the old IDNA2003 says that the ToASCII operation
can fail and that the application must deal with such failures in some
way (though it stops short of using words like "label rejection"):

ftp://ftp.rfc-editor.org/in-notes/rfc3490.txt

On the registration side, there are a number of ways to apply rules
that might be considered to belong in the general area of "script
mixing rules". I am referring here to the blocking and bundling that
you and the JET folks have documented. Of course, many of the
techniques mentioned there are fine-grained, on a
character-by-character basis, rather than script-by-script.

Now, my suggestion regarding the word "protocol". IDNA2003 admits, in
the same section 1.3 of RFC 3490 above, that it is not a client-server
or peer-to-peer protocol. Also, you use the words "the protocol level"
in your email above, but then go on to describe only one of the many
Internet protocols, namely DNS. So if by "protocol", you were
referring to an Internet protocol in the traditional IETF sense, then
it was wrong to use the word "the".

On the other hand, if by "the protocol level" you meant the IDNA200x
RFCs, then it was wrong to suggest that the DNS infrastructure is the
only place where one might conceivably apply "mixed script" rules of
any sort.

So, to clear up this type of confusion, may I suggest that we use the
word "protocol" in the traditional IETF sense of "wire protocol", both
in our email and in the new RFCs? Perhaps we could use different words
like "rules" and "recommendations" when we are referring to the
IDNA200x RFCs.

I realize that the word "protocol" is used in a different way in the
English language in general, but are you aware of any RFCs that use
the word "protocol" in the same way as IDNA2003?

Again, just a suggestion.

Happy Holidays!

Erik


More information about the Idna-update mailing list