Charter changes and a possible new direction

Wed Jan 14 04:47:00 CET 2009

On 13 jan 2009, at 22.39, Paul Hoffman wrote:

> At 10:15 PM +0100 1/13/09, Patrik Fältström wrote:
>> The change from just using a table, to
>> use a series of rules, is a big change that make things very
>> different.
>
> Both proposals have a series of rules. The IDNA2008 proposal has a  
> series of rules to create the table; that set of rules must be run  
> *every time* Unicode is updated.

Yes, but can be run by anyone, as the rules is the standard, not the  
result of running the rules.

> The IDNAv2 proposal has the exact same series of rules that people  
> have gotten used to for IDNAv1.

No, you claim yourself that the rules in IDNA2008 is "too cumbersome".  
So you are contradicting yourself here.

>> That some of the rules are tables (you mention exceptions
>> and backward compatible lists) does not change the fact this
>> algorithmic approach is Unicode independent.
>
> Disagree. The rules are only independent of Unicode versions if the  
> Unicode Consortium does not make any additions or changes that would  
> affect the table.

The difference between IDNA2008 and IDNAv2 is that for IDNA2008, the  
changes Unicode Consortium has to do to force a new RFC is _for_now_  
changes needed to the backward compatibility rule. For IDNAv2, a new  
RFC is needed for _any_ change of the Unicode tables, including  
addition of codepoints.

I.e. as the vast majority of the changes to Unicode is addition of  
codepoints, IDNAv2 (and IDNA2003) explicitly make it impossible to use  
these new codepoints without a revision of IDNA. IDNA2008 make it  
possible to "just" run the rules on the new version of Unicode and use  
the result.

>> Given no drastic changes
>> are made to Unicode in future versions, we will never see any
>> codepoints be added to the backward compatible list.
>
> Even small, non-drastic changes could cause the need for changes to  
> the table; these have been discussed in various threads in the past  
> few months.

I do not agree. Give examples.

>> The difference between your proposed approach and IDNA2008 is that  
>> for
>> your tables to work, one *have* to update the RFC for every Unicode
>> version.
>
> That's not at all true. Unicode has been updated many times since  
> 2003, and there has been no pressing need to update IDNA for each one.

You are very very wrong here Paul.

According to IDNA2003, there have been extreme pressure to update  
IDNA2003 since it was created due to the addition of new codepoints.

>> Something that is not needed in IDNA2008.
>
> Hopefully true.
>
>>> - "The constraints of the original IDN WG still apply to IDNABIS,
>>> namely to avoid disturbing the current use and operation of the
>>> domain name system, and for the DNS to continue to allow any system
>>> to resolve any domain name in a consistent way." If we consider IDNA
>>> to be part of the DNS, then this is no longer true with the current
>>> drafts. In specific, registries that are following the model of
>>> IDNA2003 now must start using registration-binding if they want to
>>> follow IDNA2008 and use European languages such as German or Greek
>>> (and possibly some Arabic languages, depending on the output of the
>>> ASIWG and this WG's adoption of their proposals).
>>
>> If I do not misunderstand you, I see no difference between IDNA2003
>> and IDNA2008 (or your proposal) regarding this binding that must
>> happen at time of registration. This due to registry policy and
>> language table issues.
>
> Sorry, then you misunderstand me. Under IDNA2003, registries that  
> registered (for example) name in German did not need to keep any  
> name bindings. They will under IDNA2008. More significantly, they  
> will need to add those bindings back to names that are already  
> issued even if those names would make no sense with a eszett/sharp-s.

Then, explain what you imply by "name binding" please.

>>> - "This work is intended to specify an improved means to produce and
>>> use stable and unambiguous IDN identifiers." IDNA2008 makes the
>>> current IDN identifiers unstable for German or Greek (and possibly
>>> some Arabic languages, depending on the output of the ASIWG and this
>>> WG's adoption of their proposals).
>>
>> I strongly disagree with this conclusion of yours.
>>
>> The statement is, once again: "This work is intended to specify an
>> improved means to produce and use stable and unambiguous IDN
>> identifiers."
>>
>> This is true as it is making a very big change from IDNA2003, and  
>> that
>> is to have a very well defined definition of an A- and U-label.
>>
>> The problem you point out has to do with the transition from IDNA2003
>> to IDNA2008 and the fact mappings where part of IDNA2003, so that it
>> is unclear whether for example eszett is "ok" (note the citation) or
>> not.
>
> We maybe agree. The A/U label idea was a good improvement, but one  
> that has caused the transition from the current protocol to the new  
> one to cause lack of clarity. In my mind, that makes some of the  
> current labels unstable and ambiguous.

But this is because there is a problem in IDNA2003, not the reverse,  
that this is due to a problem with IDNA2008. Your proposal with IDNAv2  
push the solution to this problem even further forward in time. Are  
you really proposing we should wait until IDNAv3 before we fix this?

>>> Separate from the charter problems, it is also clear that we cannot
>>> meet our original goals of making the update easy to implement. The
>>> original design was based on the idea (that I supported) that an
>>> inclusion-based system would be easier to implement than the  
>>> mapping-
>>> based system in IDNA2003. Over time, that goal clearly became
>>> impossible. We now have a protocol that relies on context-sensitive
>>> and position-sensitive regular expressions.
>>
>> For specific codepoints, yes. And you will not get away from that if
>> you update IDNA2003. If you want to move forward with your proposal,
>> you have to add exactly the same position dependent rules.
>
> Fully disagree: I see nowhere in the draft that says this.

Of course not. The regular expressions exists because discussions in  
this very WG have FORCED us to add such regular expressions as WG  
participants WANT the context dependent rules.

If we need the rules in IDNA2008, we also need the rules in IDNAv2.

If we do not need the rules (as you claim), then we could have been  
done more than a year ago.

>> FWIW, I have no problems throwing away the document I have been
>> working on, but after being a document author of IDNA2003,  
>> implementor
>> of IDNA2003 and various stringprep algrorithms, document author of
>> IDNA2008, I think you take too lightly on how easy it is to update
>> IDNA2003.
>
> Possibly true; that's one of the reasons I actually wrote a draft  
> instead of just floating the idea.
>
>> If you "just" update IDNA2003, you have to:
>>
>> - Update the future RFC at EVERY update of the Unicode Standard.
>
> Again: why? Why not do it in, say, five year cycles like we are  
> doing now? Where is the demand?

See email that is a response to what Andrew wrote, and above.

>> - Separate the mappings from the actual codepoints that can be used  
>> in
>> the DNS, and come up with a terminology for it.
>
> Sorry, now I am misunderstanding you. Please try again (or be more  
> verbose).

The separation of the mapping from the A-/U-label.

>> - Fix the Bidi issues that we knew with IDNA2003 that we did not get
>> right (or at all).
>
> I cannot tell whether or not you read the draft. It fixes both of  
> the primary problems that Harald and Cary found. What others do you  
> see as needed?

Correct, but you still have to do this in your draft. I did not claim  
you had not made all these changes. This is as I can see it the only  
change I see required that you actually have done.

>> - Still have the regular expressions that say what codepoints are
>> valid where.
>
> Disagree. Please show where in the draft those are needed.

See discussions last year on this wg mailing list. For example the  
issue with the indic digits.

The standard can according to consensus in this wg *not* allow use of  
codepoints that violate any of the regular expressions you see in the  
tables document. For example the indic digits, final sigma etc.

This requirement/consensus is true regardless of what document this wg  
produces.

>> - Still have issues with transition from IDNA2003 to IDNAv2 (as you
>> call it) as there will be incompatibilities.
>
> Where? All issues between a system running the old version and one  
> running the new version are already taken care of with the handling  
> of unassinged code points.

No. There are other codepoints that are not compatible between what is  
now the consensus of the wg and IDNA2003. The eszet, the final sigma,  
the indic digits, the graphics characters etc etc

>> So I think your document, if that is the basis for future work in  
>> this
>> wg, is very very short and to be frank, naive.
>
> Please show where it is too short.

See the list above. You have added only one of a series of things to  
the draft. To add solutions to the rest (for example how to solve the  
fact you will need regular expressions) is not easy.

> And, I am quite willing to admit that I think if the WG has the  
> choice of "short and naive" and "long and naive", we should pick the  
> former. YMMV.

    Patrik