Charter changes and a possible new direction

Wed Jan 14 06:10:14 CET 2009

First: please calm down and read the proposal. What you are saying is mostly "you cannot do what we have done as a straight update to IDNA2003". We fully agree on that. I am not trying to do what you have done; I am trying to do what I said in the document:

- to allow labels with characters that have been added since Unicode version 3.2 to be used in IDNA.

- to not change the encoding of any label that is legal in IDNAv1.

- to update the bidirectional ("bidi") algorithm used by IDNAv1 to cover more languages such as Dhivehi and Yiddish.

The first and third are the same as two of the goals in IDNA2008; the second is clearly different. I did not try to achieve the other goals of IDNA2008 for the reasons I gave in the first message: doing so seemed to make the result overly complex and to violate my second goal (and therefore violate the charter).

At 4:47 AM +0100 1/14/09, Patrik Fältström wrote:
>On 13 jan 2009, at 22.39, Paul Hoffman wrote:
>
>> At 10:15 PM +0100 1/13/09, Patrik Fältström wrote:
>>> The change from just using a table, to
>>> use a series of rules, is a big change that make things very
>>> different.
>>
>> Both proposals have a series of rules. The IDNA2008 proposal has a 
>> series of rules to create the table; that set of rules must be run 
>> *every time* Unicode is updated.
>
>Yes, but can be run by anyone, as the rules is the standard, not the 
>result of running the rules.

And the rules in my proposal can also be run "by anyone". No difference.

> > The IDNAv2 proposal has the exact same series of rules that people 
>> have gotten used to for IDNAv1.
>
>No, you claim yourself that the rules in IDNA2008 is "too cumbersome". 
>So you are contradicting yourself here.

This does not make sense.

First, I never said they were too cumbersome, I said that they were quite complex. They are probably able to be implemented, but it is likely that people will implement them wrong because of their complexity.

Second, the complexity of the IDNA2008 rules is unrelated to the complexity of the IDNA2003 mappings.

I do not see the contradiction.

> >> That some of the rules are tables (you mention exceptions
>>> and backward compatible lists) does not change the fact this
>>> algorithmic approach is Unicode independent.
>>
>> Disagree. The rules are only independent of Unicode versions if the 
>> Unicode Consortium does not make any additions or changes that would 
>> affect the table.
>
>The difference between IDNA2008 and IDNAv2 is that for IDNA2008, the 
>changes Unicode Consortium has to do to force a new RFC is _for_now_ 
>changes needed to the backward compatibility rule.

Fully agree.

>For IDNAv2, a new 
>RFC is needed for _any_ change of the Unicode tables, including 
>addition of codepoints.

Fully disagree. You have never shown why a new version of Unicode forces a new version of IDNA. As I said earlier, we have had many new versions of Unicode since 2003, and the world has been just fine. A few people wanted a new version of IDNA to handle the new useful characters, but that is far from a "need". We are fixing that now. I assume that the Unicode Consortium will continue to make new versions, some of which will have useful new characters in them. We can do another update to IDNA in a few years.

>I.e. as the vast majority of the changes to Unicode is addition of 
>codepoints, IDNAv2 (and IDNA2003) explicitly make it impossible to use 
>these new codepoints without a revision of IDNA.

Correct. As we have seen, there was not a strong need, but a desire. This is true for almost every protocol that comes from the IETF.

>IDNA2008 make it 
>possible to "just" run the rules on the new version of Unicode and use 
>the result.

Agree, as long as nothing needs to be added to the Exceptions or the BackwardsCompatible categories.

> >> Given no drastic changes
>>> are made to Unicode in future versions, we will never see any
>>> codepoints be added to the backward compatible list.
> >
>> Even small, non-drastic changes could cause the need for changes to 
>> the table; these have been discussed in various threads in the past 
>> few months.
>
>I do not agree. Give examples.

If the Unicode Consortium adds a new JoinControl in the future, a CONTEXTJ rule needs to be created.

If the Unicode Consortium changes the property of a letter that was Unstable to become stable, it would go from DISALLOWED to being PVALID.

And so on.

> >> The difference between your proposed approach and IDNA2008 is that 
>>> for
>>> your tables to work, one *have* to update the RFC for every Unicode
>>> version.
>>
>> That's not at all true. Unicode has been updated many times since 
>> 2003, and there has been no pressing need to update IDNA for each one.
>
>You are very very wrong here Paul.
>
>According to IDNA2003, there have been extreme pressure to update 
>IDNA2003 since it was created due to the addition of new codepoints.

Which part of IDNA2003 says that?

Where is the pressure? For which codepoints? Why don't we hear from those people directly?

> >>> - "The constraints of the original IDN WG still apply to IDNABIS,
>>>> namely to avoid disturbing the current use and operation of the
>>>> domain name system, and for the DNS to continue to allow any system
>>>> to resolve any domain name in a consistent way." If we consider IDNA
>>>> to be part of the DNS, then this is no longer true with the current
>>>> drafts. In specific, registries that are following the model of
>>>> IDNA2003 now must start using registration-binding if they want to
>>>> follow IDNA2008 and use European languages such as German or Greek
>>>> (and possibly some Arabic languages, depending on the output of the
>>>> ASIWG and this WG's adoption of their proposals).
>>>
>>> If I do not misunderstand you, I see no difference between IDNA2003
>>> and IDNA2008 (or your proposal) regarding this binding that must
>>> happen at time of registration. This due to registry policy and
>>> language table issues.
>>
>> Sorry, then you misunderstand me. Under IDNA2003, registries that 
>> registered (for example) name in German did not need to keep any 
>> name bindings. They will under IDNA2008. More significantly, they 
>> will need to add those bindings back to names that are already 
>> issued even if those names would make no sense with a eszett/sharp-s.
>
>Then, explain what you imply by "name binding" please.

IDNA2008 creates new labels that were not possible in IDNA2003 due to mapping. IDNA2003 allows fussball.com but not fußball.com; IDNA2008 allows the latter. Under IDNA2003, someone who enters fußball.com goes to fussball.com If a registry is allowed to register fußball.com unbound from fussball.com, people who were entering the name will go to a different site than they have been going to. This leads to two options for the registry:

- Be unstable with respect to IDNA2003

- Bind the two names together so that all registry-level changes to one name are automatically reflected in the other name

We have long talked about a "registry best practices" document. I have assumed (possibly incorrectly) that it would advocate binding names.

> > We maybe agree. The A/U label idea was a good improvement, but one 
>> that has caused the transition from the current protocol to the new 
>> one to cause lack of clarity. In my mind, that makes some of the 
>> current labels unstable and ambiguous.
>
>But this is because there is a problem in IDNA2003, not the reverse, 
>that this is due to a problem with IDNA2008. Your proposal with IDNAv2 
>push the solution to this problem even further forward in time. Are 
>you really proposing we should wait until IDNAv3 before we fix this?

No, I am proposing that we do not fix the problem because doing so causes the other problems that we have with the current document set. It is not at all a fatal problem, as is shown by the wide deployment of IDNA2003. It would have been good to fix if we could have done so cleanly; it looks not that we could not.

> >>> Separate from the charter problems, it is also clear that we cannot
>>>> meet our original goals of making the update easy to implement. The
> >>> original design was based on the idea (that I supported) that an
>>>> inclusion-based system would be easier to implement than the 
>>>> mapping-
>>>> based system in IDNA2003. Over time, that goal clearly became
>>>> impossible. We now have a protocol that relies on context-sensitive
>>>> and position-sensitive regular expressions.
>>>
>>> For specific codepoints, yes. And you will not get away from that if
>>> you update IDNA2003. If you want to move forward with your proposal,
>>> you have to add exactly the same position dependent rules.
>>
>> Fully disagree: I see nowhere in the draft that says this.
>
>Of course not. The regular expressions exists because discussions in 
>this very WG have FORCED us to add such regular expressions as WG 
>participants WANT the context dependent rules.
>
>If we need the rules in IDNA2008, we also need the rules in IDNAv2.

We got the rules under IDNA2008 because the structure allows them. My proposal for IDNAv2 does not allow them. This is the same decision we made six years ago.

>If we do not need the rules (as you claim), then we could have been 
>done more than a year ago.

Again, you mistake "need" for "want". The IDNA2008 document set opened up the possibility for the rules, and people filled that void. I propose that we do not do that for the reasons we have seen.

> >> - Separate the mappings from the actual codepoints that can be used 
>>> in
>>> the DNS, and come up with a terminology for it.
>>
>> Sorry, now I am misunderstanding you. Please try again (or be more 
>> verbose).
>
>The separation of the mapping from the A-/U-label.

Got it. It was a good idea, but not needed.

> >> - Fix the Bidi issues that we knew with IDNA2003 that we did not get
>>> right (or at all).
>>
>> I cannot tell whether or not you read the draft. It fixes both of 
>> the primary problems that Harald and Cary found. What others do you 
>> see as needed?
>
>Correct, but you still have to do this in your draft.

It is done, completely, in section 3:

   In section 6, at the end of the fourth paragraph (which currently
   ends with "have bidirectional category "EN"."), the following
   sentence is added: "The Unicode Standard also defines a bidirectional
   category "NSM" for "non-spacing marks"."

   In section 6, the third requirement is changed to read:

   | 3) If a string contains any RandALCat character, a RandALCat
   |   character MUST be the first character of the string, and
   |   either a RandALCat character or NSM charcter MUST be the
   |   last character of the string.

What else do you think is needed?

>I did not claim 
>you had not made all these changes. This is as I can see it the only 
>change I see required that you actually have done.

All technical changes to Stringprep (other than the bidi one) are additions to the tables. I showed the ones that popped out to me in looking through the differences between Uncode 3.2 and 5.1. There may be other additions; those can be determined by human review of the characters added.

Again: I'm not trying to match all the changes that are in IDNA2008, I am trying to update IDNA2003 in a backwards-compatible fashion. I truly believe that will be more likely to pass muster with the IESG than a non-compatible and complex set of changes.

> >> - Still have the regular expressions that say what codepoints are
>>> valid where.
>>
>> Disagree. Please show where in the draft those are needed.
>
>See discussions last year on this wg mailing list. For example the 
>issue with the indic digits.

I have proposed no change from IDNA2003 for Indic digits.

>The standard can according to consensus in this wg *not* allow use of 
>codepoints that violate any of the regular expressions you see in the 
>tables document. For example the indic digits, final sigma etc.
>
>This requirement/consensus is true regardless of what document this wg 
>produces.

The features asked for by the WG were clearly reflections of the early work that was done on IDNA2008. At the time we asked for those features, we did not know where the process would lead us.

> >> - Still have issues with transition from IDNA2003 to IDNAv2 (as you
> >> call it) as there will be incompatibilities.
>>
>> Where? All issues between a system running the old version and one 
>> running the new version are already taken care of with the handling 
>> of unassinged code points.
>
>No. There are other codepoints that are not compatible between what is 
>now the consensus of the wg and IDNA2003. The eszet, the final sigma, 
>the indic digits, the graphics characters etc etc

See above. We disagree here about what was "consensus" and in what light it was made. I fully agree that my proposal has different properties for those characters, ones that make my proposal not change allocated domain names.