Charter changes and a possible new direction

Patrik Fältström patrik at frobbit.se
Wed Jan 14 07:34:31 CET 2009


On 14 jan 2009, at 06.10, Paul Hoffman wrote:

> What you are saying is mostly "you cannot do what we have done as a  
> straight update to IDNA2003".

Not really. What I say is more "you point out that because of reasons  
A, B and C, IDNA2008 is complicated". But what I see is that when I  
started with IDNA2008, I did not think A, B and C was needed. Now the  
wg consensus say that A, B and C is needed, and if those things are  
needed for IDNA2008, then they are needed also for IDNAv2. And also  
the contrary should be true if we are comparing IDNAv2 and IDNA2008.  
If IDNAv2 is done without A, B and C, then we should remove A, B and C  
also from IDNA2008.

> We fully agree on that. I am not trying to do what you have done; I  
> am trying to do what I said in the document:

>
> - to allow labels with characters that have been added since Unicode  
> version 3.2 to be used in IDNA.
>
> - to not change the encoding of any label that is legal in IDNAv1.
>
> - to update the bidirectional ("bidi") algorithm used by IDNAv1 to  
> cover more languages such as Dhivehi and Yiddish.

I agree with this, part from me not believing you can do the 2nd and  
3rd without getting contradictions. But I am not a BiDi specialist.  
Other people have to look at that.

I also think that we need some "exceptions" that are carefully looked  
at if you *really* want the 2nd to be true. For example for the  
incompatible changes made to Unicode. So producing a document is, as  
you point out in your document, not easy.

> The first and third are the same as two of the goals in IDNA2008;  
> the second is clearly different. I did not try to achieve the other  
> goals of IDNA2008 for the reasons I gave in the first message: doing  
> so seemed to make the result overly complex and to violate my second  
> goal (and therefore violate the charter).

The problem is that there are other requirements that have poped up on  
the mailing list that I think you must consider and say what you do  
about for example:

1. The ability to *register* the esszet
2. The requirements to have contextual rules (indic digits for example)

I would *love* to ignore these things in IDNA2008 because it would  
make the standard so much easier, but that is not possible as the wg  
require these things. And therefore you have to, before we compare  
IDNAv2 and IDNA2008, also add such things to IDNAv2. Otherwise we  
compare apple and pears when comparing "which one of the solutions is  
the best".

>>> Both proposals have a series of rules. The IDNA2008 proposal has a
>>> series of rules to create the table; that set of rules must be run
>>> *every time* Unicode is updated.
>>
>> Yes, but can be run by anyone, as the rules is the standard, not the
>> result of running the rules.
>
> And the rules in my proposal can also be run "by anyone". No  
> difference.

Exactly. And that is why standardizing the rules is better than  
standardizing the output from the tables. Because then we be  
independent of the Unicode version.

>>> The IDNAv2 proposal has the exact same series of rules that people
>>> have gotten used to for IDNAv1.
>>
>> No, you claim yourself that the rules in IDNA2008 is "too  
>> cumbersome".
>> So you are contradicting yourself here.
>
> This does not make sense.
>
> First, I never said they were too cumbersome, I said that they were  
> quite complex.

For me not being a native english speaking person, those two things  
are the same.

> They are probably able to be implemented, but it is likely that  
> people will implement them wrong because of their complexity.

What's the difference between those and "your rules"?

What rules are problematic?

> Second, the complexity of the IDNA2008 rules is unrelated to the  
> complexity of the IDNA2003 mappings.
>
> I do not see the contradiction.

Then I guessed wrong on what part of "IDNA2008 rules" you thought  
where complex.

Please be more specific.

>> For IDNAv2, a new
>> RFC is needed for _any_ change of the Unicode tables, including
>> addition of codepoints.
>
> Fully disagree. You have never shown why a new version of Unicode  
> forces a new version of IDNA. As I said earlier, we have had many  
> new versions of Unicode since 2003, and the world has been just fine.

No, the world has not been fine at all. There are codepoints added to  
Unicode that have not been possible to be used in IDN's.

> A few people wanted a new version of IDNA to handle the new useful  
> characters, but that is far from a "need".

I think you should be very very careful with using such statements as  
"a few". Specifically as the IETF is dominated by north american  
people that have English as their native language.

I really understand because of the work and arguments I have heard in  
specifically this wg why people from outside of the USA feel it is  
impossible to participate in the IETF effectively. Quite often english  
native speaking people are telling local language communities that  
they are "a few" or what they need and not (Korea, Germany).

And I do understand now what problems Nomcom do have when looking for  
people for the leadership of the IETF (IAB+IESG). Look at the  
participation at the last IETF (what percentage of participation from  
various countries) and look at the percentage of people from different  
countries in IESG and IAB.

> We are fixing that now.

Agreed.

> I assume that the Unicode Consortium will continue to make new  
> versions, some of which will have useful new characters in them. We  
> can do another update to IDNA in a few years.

For every individual that want even ONE of the codepoints that are  
added, that codepoint is VERY needed for that person. I am not the  
person that can in a straight face tell anyone that have managed to  
get their language represented in the Unicode character set that "your  
language is not important enough, and btw, you have not spoken up in  
the IETF".

For them, their codepoint is needed. And further, if it was not more  
important than that, it would not have been added to Unicode.

>> I.e. as the vast majority of the changes to Unicode is addition of
>> codepoints, IDNAv2 (and IDNA2003) explicitly make it impossible to  
>> use
>> these new codepoints without a revision of IDNA.
>
> Correct. As we have seen, there was not a strong need, but a desire.  
> This is true for almost every protocol that comes from the IETF.

I completely disagree with this statement.

>> IDNA2008 make it
>> possible to "just" run the rules on the new version of Unicode and  
>> use
>> the result.
>
> Agree, as long as nothing needs to be added to the Exceptions or the  
> BackwardsCompatible categories.

And if/when the BackwardsCompatible category is added to IANA, not  
even then.

>>>> Given no drastic changes
>>>> are made to Unicode in future versions, we will never see any
>>>> codepoints be added to the backward compatible list.
>>>
>>> Even small, non-drastic changes could cause the need for changes to
>>> the table; these have been discussed in various threads in the past
>>> few months.
>>
>> I do not agree. Give examples.
>
> If the Unicode Consortium adds a new JoinControl in the future, a  
> CONTEXTJ rule needs to be created.
>

> If the Unicode Consortium changes the property of a letter that was  
> Unstable to become stable, it would go from DISALLOWED to being  
> PVALID.

For me those are drastic changes, as the Unicode Consortium say that  
these things would be extremely rare, and they could not see that  
happening more or less only if errors are found.

> And so on.
>
>>>> The difference between your proposed approach and IDNA2008 is that
>>>> for
>>>> your tables to work, one *have* to update the RFC for every Unicode
>>>> version.
>>>
>>> That's not at all true. Unicode has been updated many times since
>>> 2003, and there has been no pressing need to update IDNA for each  
>>> one.
>>
>> You are very very wrong here Paul.
>>
>> According to IDNA2003, there have been extreme pressure to update
>> IDNA2003 since it was created due to the addition of new codepoints.
>
> Which part of IDNA2003 says that?

Sorry, what I wanted to say was that "according to idna2003, we are  
fixed with Unicode 3.2, so...".

> Where is the pressure? For which codepoints? Why don't we hear from  
> those people directly?

Because IETF is dominated by north american white males that have  
english as their native language. Just see on how english speaking  
people hammered on the korean community that did come to the IETF (and  
even travelled across the globe to make their statements) to say what  
they needed.

This is why they talk with ICANN, ITU and UN, and that way indirectly  
talk with us.

I am surprised our friends from South Korea did not give up and went  
home.

>> Then, explain what you imply by "name binding" please.
>
> IDNA2008 creates new labels that were not possible in IDNA2003 due  
> to mapping. IDNA2003 allows fussball.com but not fußball.com;

IDNA2003 make fußball.com possible due to the mappings. IDNA2003 does  
not allow it in the DNS.

> IDNA2008 allows the latter. Under IDNA2003, someone who enters  
> fußball.com goes to fussball.com If a registry is allowed to  
> register fußball.com unbound from fussball.com, people who were  
> entering the name will go to a different site than they have been  
> going to. This leads to two options for the registry:
>
> - Be unstable with respect to IDNA2003
>
> - Bind the two names together so that all registry-level changes to  
> one name are automatically reflected in the other name

Paul, the registry that have these problems is the one that push for  
this change. I can not accept that you use this as argument against  
IDNA2008, and point at registry problems when the registry want this.

I was against this change, but the wg consensus is to allow the  
registration of domain names with ß. Mappings was not enough anymore.

The community want this incompatible change.

If now you revoke the consensus call in the wg, and we remove the  
ability to register domain names with ß, noone would be happier than I.

So, this has absolutely nothing to do with IDNA2008. It has to do with  
interests in this wg on what changes should be made between IDNA2003  
and the replacement of IDNA2003.

> We have long talked about a "registry best practices" document. I  
> have assumed (possibly incorrectly) that it would advocate binding  
> names.

"binding names" is I think what I normally call bundles in the  
variant / language tables talked about already in for example RFC  
4290. I.e. this already is in use, and nothing strage, and nothing  
new. So I do not understand your goal by rising this issue.

>> But this is because there is a problem in IDNA2003, not the reverse,
>> that this is due to a problem with IDNA2008. Your proposal with  
>> IDNAv2
>> push the solution to this problem even further forward in time. Are
>> you really proposing we should wait until IDNAv3 before we fix this?
>
> No, I am proposing that we do not fix the problem because doing so  
> causes the other problems that we have with the current document  
> set. It is not at all a fatal problem, as is shown by the wide  
> deployment of IDNA2003. It would have been good to fix if we could  
> have done so cleanly; it looks not that we could not.

We do not have a wide deployment yet. And, we in the deployment do see  
problems due to this terminology problem. Even in Sweden. If nothing  
else, I have seen this in the ICANN discussions that have been  
extremely complicated due to the terminology problems. Specifically  
when talking about what codepoints should really be included in the  
bundles and not (what you call name bindings). Last serious discussion  
I had of this was with the Indian Government in the UN context. That  
*is* serious, and already with the text about A- and U-label in the  
IDNA2008 drafts, we have managed to solve many problems for various  
registries regarding clarifying their policy.

My view is that as this is a serious problem, we better fix this now  
than later.

>> Of course not. The regular expressions exists because discussions in
>> this very WG have FORCED us to add such regular expressions as WG
>> participants WANT the context dependent rules.
>>
>> If we need the rules in IDNA2008, we also need the rules in IDNAv2.
>
> We got the rules under IDNA2008 because the structure allows them.  
> My proposal for IDNAv2 does not allow them. This is the same  
> decision we made six years ago.

Nope, because the most serious contextual rules are required because  
of issues with codepoints that are added due to bidi implications. The  
digits for example. Second example is the final sigma. Things and  
issues you will face also with IDNAv2.

Either we have consensus in the wg to have contextual rules and fix  
the bidi problems, zwj issues and final sigma or we are not. This is a  
question completely independent on whether we do IDNAv2 or IDNA2008.

>> The separation of the mapping from the A-/U-label.
>
> Got it. It was a good idea, but not needed.

I disagree. It was *really* needed, and one of the best things with  
IDNA2008.

See above.

>>>> - Fix the Bidi issues that we knew with IDNA2003 that we did not  
>>>> get
>>>> right (or at all).
>>>
>>> I cannot tell whether or not you read the draft. It fixes both of
>>> the primary problems that Harald and Cary found. What others do you
>>> see as needed?
>>
>> Correct, but you still have to do this in your draft.
>
> It is done, completely, in section 3:
>
>   In section 6, at the end of the fourth paragraph (which currently
>   ends with "have bidirectional category "EN"."), the following
>   sentence is added: "The Unicode Standard also defines a  
> bidirectional
>   category "NSM" for "non-spacing marks"."
>
>   In section 6, the third requirement is changed to read:
>
>   | 3) If a string contains any RandALCat character, a RandALCat
>   |   character MUST be the first character of the string, and
>   |   either a RandALCat character or NSM charcter MUST be the
>   |   last character of the string.
>
> What else do you think is needed?

Nothing. As I say in the next paragraph (see below), I did not claim  
you had not solved any of these issues.

>> I did not claim
>> you had not made all these changes. This is as I can see it the only
>> change I see required that you actually have done.
>
> All technical changes to Stringprep (other than the bidi one) are  
> additions to the tables. I showed the ones that popped out to me in  
> looking through the differences between Uncode 3.2 and 5.1. There  
> may be other additions; those can be determined by human review of  
> the characters added.
>
> Again: I'm not trying to match all the changes that are in IDNA2008,  
> I am trying to update IDNA2003 in a backwards-compatible fashion. I  
> truly believe that will be more likely to pass muster with the IESG  
> than a non-compatible and complex set of changes.
>
>>
>> The standard can according to consensus in this wg *not* allow use of

>> codepoints that violate any of the regular expressions you see in the
>> tables document. For example the indic digits, final sigma etc.
>>
>> This requirement/consensus is true regardless of what document this  
>> wg
>> produces.
>
> The features asked for by the WG were clearly reflections of the  
> early work that was done on IDNA2008. At the time we asked for those  
> features, we did not know where the process would lead us.

Nope. They are reflections from local language communities when IETF  
finally managed to get in contact with them, and for example the  
arabic language group was created.

What you say is that IETF should ignore the requests from Greece,  
Germany, India etc?

>>>> - Still have issues with transition from IDNA2003 to IDNAv2 (as you
>>>> call it) as there will be incompatibilities.
>>>
>>> Where? All issues between a system running the old version and one
>>> running the new version are already taken care of with the handling
>>> of unassinged code points.
>>
>> No. There are other codepoints that are not compatible between what  
>> is
>> now the consensus of the wg and IDNA2003. The eszet, the final sigma,
>> the indic digits, the graphics characters etc etc
>
> See above. We disagree here about what was "consensus" and in what  
> light it was made. I fully agree that my proposal has different  
> properties for those characters, ones that make my proposal not  
> change allocated domain names.

Yup, we completely agree that you have solved a different problem than  
what the consensus of this wg has requested. ;-)

I would love to rip out things from IDNA2008 so that it only solves  
the tiny bit that you cover in IDNAv2.

But that would require opening up the consensus calls we have had in  
the wg. I do not think that would make people happy.

Finally, it is just because IDNAv2 ignore the current consensus of the  
wg, and IDNA2008 is solving the same problems, what I try to say is  
that we can not compare IDNAv2 and IDNA2008 and say which is more  
complicated than the other one.

If we should compare, then we should do IDNAv2 and IDNA2008 that  
solves the same problem.

    Patrik



More information about the Idna-update mailing list