Unicode versions (Re: Criteria for exceptional characters)

Kenneth Whistler kenw at sybase.com
Mon Dec 18 21:04:02 CET 2006


Michael Everson wrote:

> At 09:45 -0800 2006-12-18, Mark Davis wrote:
> >I don't think it is necessary. If mixtures of scripts are not 
> >displayed (eg the user-agent flags them as discussed before), then 
> >they are not a problem. If mixtures of scripts *are* allowed, then 
> >there are so many other problems (eg with Cyrillic) that these pale 
> >in comparison.
> 
> I do not understand why (apart from Japanese and Korean) the 
> possibility of mixing scripts is still being discussed as though it 
> were necessary. DON'T mix Syllabics with other scripts, and be done.

The problem is that you are mixing *levels* here.

At the protocol level, we need to define an updated IDNA that
will handle Unicode 5.0 (and future udpates to Unicode)
gracefully. And it needs to use some updated version of StringPrep
to convert Unicode input strings into a lookup form safe for DNS.

Trying to enforce no script mixing in *StringPrep* makes that
processing more complicated, and is overly restrictive at
the protocol level.

Registries, on the other hand, may be well-advised, depending
on which registry they are and what they need to cover, to
not only refuse to register mixed-script domain names, but
even to ban wholesale the use of most scripts other than
those relevant to their registry. That is an easy step to
eliminate huge amounts of confusion and possibility for misuse.

Then at the use agent level you have yet another level which
can respond intelligently to mixed-script identifiers, as
discussed in UTR #36.

--Ken



More information about the Idna-update mailing list