The lookalike problem(s)

Vint Cerf vint at
Mon Nov 27 01:21:39 CET 2006


Please stop for a moment and think about the problem the engineers have.
They are trying to determine whether a relatively simply-described algorithm
would produce a suitable subset of the UNICODEs for use in IDNs. This is
simply an exercise. If it doesn't work, for a variety of reasons, we will be
back to considering every character, one at a time, still trying to group
them so as to determine which subsets can be used freeely within a given
label in a domain name. So far, the exercise seems to me to point in that
direction, but this was worth trying.

Moreover, it is vital that you appreciate the difference between the set of
expressions that it is reasonable to support for IDNs and the production of
general language. It is NOT the same thing. In fact, it is absolutely clear
that we cannot support general language in IDNs, for many of the character
sets under consideration. The problem of confusables contributes
significantly to this limitation. If you continue to view IDN space as a
space for general discourse, you will come to completely unsuitable
conclusions about the pragmatic solution for choice of characters to permit
in IDNs. 


Vinton G Cerf
Chief Internet Evangelist
Regus Suite 384
13800 Coppermine Road
Herndon, VA 20171
+1 703 234-1823
+1 703-234-5822 (f)
vint at

-----Original Message-----
From: idna-update-bounces at
[mailto:idna-update-bounces at] On Behalf Of Michael Everson
Sent: Sunday, November 26, 2006 6:10 PM
To: idna-update at
Subject: Re: The lookalike problem(s)

At 17:18 -0500 2006-11-26, John C Klensin wrote:

>IMO, unless we find a mechanism that no one has been able to think of 
>yet, the only "script mixing" rule that belongs in the protocol itself 
>is one involving mixed LtoR and RtoL substrings.
>Personally, I wish we could get rid of that one too, just to be 
>completely consistent about what belongs in the lookup protocol and 
>what is properly a restriction at registration time, but I don't see 
>any way to do that without introducing far more confusion about 
>ordering and rendering than the principle could possibly be worth.

I don't understand any of this. Once I thought we had been making progress
and agreements. Now all I see is this entire endeavour heading for disaster,
because whatever progress and agreements were made are now being mooted for
jettisoning. I cannot fathom why. John, you wrote me privately and said that
people were considering this an engineering exercise. You wrote a long and
considered post to me, and I owe you a considered response, which I haven't
made because I was travelling this weekend. But when I saw what you've
written above... 
well, I'm just gob-smacked.

Language is not engineering. One cannot algorithm one's way to neatness in
this effort. Language isn't tidy.

John? If we do not ban script mixing, how do you propose that IDN will work?
I mean, how?

Maybe I'm wrong. Maybe going back to first principles makes sense and will
help. But I do not understand how rescinding a ban on script mixing can
possibly help make IDN a reality.

Michael Everson *
Idna-update mailing list
Idna-update at

More information about the Idna-update mailing list