Tables: BackwardCompatible Maintanence

Fri Dec 12 17:31:10 CET 2008

On Fri, Dec 12, 2008 at 1:00 AM, Martin Duerst <duerst at it.aoyama.ac.jp> wrote:
> At 02:58 08/12/11, Erik van der Poel wrote:
>>BackwardCompatible semi-automatic - implementors are encouraged not to
>>implement new versions of Unicode in IDNA until the IETF (or some
>>group of experts) has declared the chosen action (i.e. addition to
>>BackwardCompatible or addition to Exceptions).
>
> This is EXACTLY what we want to AVOID. The main goal of INDA200x,
> in my opinion, is to make additions to Unicode available to IDNs
> without a waiting period and unnecessary human involvement.

We could start looking at the differences when new Unicode alpha or
beta versions are published. Then there shouldn't be much of a waiting
period. We will pray that new characters are given the right
properties, just as I pray now that the IDNA2008 Tables document's
rules are fine-grained and correct enough to last a few years, given
that we don't have input from communities in far-flung corners of the
globe.

>>I'm sorry, but this is not very persuasive. There is really nothing
>>you can say to convince me that eszett was not botched. Either the
>>Unicode case-mappings were recommended too forcefully, or the IETF
>>accepted them too easily, or the German registry was not present, or
>>the German registry changed their mind in 2008, or some combo of the
>>above. Either way, at the end of the day, it's botched.
>
> I was involved then. Basically, the way it went was:
> IETF: we need something telling us how to deal with case mapping,
>    especially these special cases.
> Unicode: we have SpecialCasing.txt
> IETF: Okay, we'll take that
> Some individuals (including me): But for character foo, is this really
>    the right thing to do? The main purpose of SpecialCasing.txt is
>    for thing like search; identification may have different needs.
> IETF: If we start discussing this in detail, we'll never finish
>    the spec. SpecialCasing.txt is the only thing we have, so let's
>    go with it.

Thanks for the history. I see. So we have a combination of reasons:
the IETF adopted one part of Unicode too readily, the German registry
was not present (or did not understand the ramifications), and the
plug-in and browser implementors went ahead and added pre-processing
to their products even though IDNA2003 explicitly warns about
IDNA-unaware domain name slots.

It's interesting, because we are likely making some of the same
mistakes in IDNA2008, but to a much smaller degree. We are adopting
huge swaths of Unicode via relatively simple rules. We are doing this
because it takes too long to consider each character, one by one.

This time, the German, Greek and Arabic representatives are present
(and we have discussed sharp-s, final sigma and ZW*J), but we don't
have representatives from very many other parts of the world. So we
just have to pray that we got it right.

Finally, the implementors. We shall see how it goes. I think we need
much more discussion and coordination to ensure good long-term
results.

Erik