The Two Lookups Approach (was Re: Parsing the issuesand finding a middle ground -- another attempt)

Sat Mar 14 16:31:23 CET 2009

On Sat, Mar 14, 2009 at 1:01 AM, John C Klensin <klensin at jck.com> wrote:
> The question below seems to have never been answered... sorry.

Thank you for answering this important question.

> --On Saturday, March 07, 2009 15:52 +0900 Martin Duerst
> <duerst at it.aoyama.ac.jp> wrote:
>> At 01:41 09/03/07, John C Klensin wrote:
>>> It is worth stressing that the occurrence of this sort of
>>> problem does not depend on IDNA2008.  Paul's IDNAv2 proposal
>>> would cause it equally well, as would anything else that
>>> provides a change from Unicode 3.2 to Unicode 5.1 and, more
>>> generally, most or all future changes to Unicode that add new
>>> characters to existing scripts to improve the way in which
>>> those scripts can be expressed.
>>
>> To be precise, only characters that interact with others
>> in the script would be problematic, not completely independent
>> characters. Or are I'm missing something?
>
> I think that is correct.  But "completely independent
> characters" may be a slightly fuzzy concept in practice.  Two
> examples...
>
> (i) Adding the Chillu to Malayalam, as Unicode 5.1 does,
> involves completely independent characters, in the sense that
> nothing was there before and they are not treated as
> compositions of characters that were in Unicode earlier.   In
> one way, that makes them "independent" of what has come before,
> but they profoundly change the way the script is coded, so they
> to interact and set up a problematic situation.

Yes, the Chillu issue is an important example because it would require
triple lookup rather than the double lookup that we have been
discussing until now. I.e. after the user has entered some sequence of
characters, the implementation would have to try a sequence without ZW
characters, with ZW characters and with Chillu.

One possible answer for this script (Malayalam) is to suggest that
since there are very few Malayalam domain names in use today, we
should just jump straight to the Chillu, and not attempt any
complicated transition involving double or triple lookup.

The big question then is whether we can make a similar jump for Eszett
and Final Sigma. I'd be interested to hear from the German and Greek
registries, now that we have figured out how to display those
characters (via CNAME/DNAME).

(And then there's the question of jumping straight to ZW* for Farsi, Urdu, etc.)

It might be a good idea to come up with an open source implementation
of a utility that generates all (or many) of the possible A-labels
from a single label with one or more ZW characters. This could then be
downloaded not only by top-level and high-level domain name
registries, but also by lower-level zone administrators that wish to
participate in any "jump".

It sure would be nice to avoid double and triple lookups...

Erik