Charter changes and a possible new direction

Patrik Fältström patrik at frobbit.se
Wed Jan 14 05:57:05 CET 2009


On 14 jan 2009, at 05.26, Andrew Sullivan wrote:

> On Wed, Jan 14, 2009 at 04:56:55AM +0100, Patrik Fältström wrote:
>> On 14 jan 2009, at 04.19, Andrew Sullivan wrote:
>>
>>> So, I have two questions:
>>>
>>> 1.  Just how bad is it to tie IDNA to a particular version of
>>> Unicode, and why?  (Ok, maybe this is two questions.)
>>
>> Mainly because if a new version of Unicode is released, a registry  
>> can
>> not use the added codepoints in IDNs until there is a new version of
>> IDNA released.
>
> Yes, I get this, but _how bad_ is that?  Are the users of the code
> points since 2003 the ones who are driving the current work?

They are the ones that pushed me to start work on this issue together  
with the ones that need bidi. People from China, Korea, India and the  
Arab speaking community has pushed me. Either directly or indirectly  
via ICANN.

> I don't
> have that impression, but since I don't hang out in the circles where
> most of the pressure appears to me to be coming from, I'm fully
> prepared to be wrong.  My impression is that the people unsatisfied
> with IDNA2003 are unsatisfied for other reasons.

Yes, for example by in this discussion is called the lack of "regular  
expressions". I.e. context dependent rules for some codepoints.  
Something that will be needed in IDNAv2 as well as in IDNA2008.

> Note, too, that it seems to me the current effort is taking a long
> time because it's _not_ a straight "fix the tables" effort to bring
> IDNA2003 up to date with Unicode.

Correct. We tried to do that, but (a) fixing the tables, (b) fixing  
bidi, (c) adding regular expressions, but also (the only difference  
from IDNA2003) (d) standardizing the rules and not the output from the  
rules so that we will not need IETF action for every Unicode version  
that is published.

I.e. the differences are not many.

The *explanations* for the protocol are different.

> I'm sympathetic, however, to the
> problem that a smaller effort could well just not get adequate review.

I do not think IDNAv2 would be a smaller effort.

If you really think we could do "just" a review and IDNAv2 as  
suggested, why did we have the last years discussion on the regular  
expressions for IDNA2008? We could have been done with version 1 of  
tables (etc). I can rip that out if that is what is suggested, but I  
do not think that is what you suggest.

Next question is if you think we will save time by doing IDNAv2, and  
then still IDNA2008. I do not think so because I claim we are done  
with IDNA2008, so even IDNA2008 would be published faster than IDNAv2.  
Much faster. So not even that is something I think is a good idea.

>> Another reason is that a programmer that implement anything with
>> Unicode can not know what version of Unicode is installed so use of
>> the installed tables is just not possible if you want to be
>> conformant. You have to use explicitly IDNA2003 (today) libraries and
>> not any of the Unicode libraries installed, as there might be
>> incompatibilities (what are unassigned codepoints for example).
>
> Again, how bad is that?  This is the issue I used to be sure was
> obviously a big deal, but about which I'm now much less sure.

I have had problems when writing the code that generate the tables I  
have. Not the tables themselves, but when I try to write code that  
check the possible incompatibilities between IDNA2003 and IDNA2008. It  
is not very easy.

> The
> discussion in 4690 sure makes it sound like a big deal, but in the
> absence of a complete example I'm just not sure what to think.
>
> I can think of different ways in which this could be bad:
>
> 1.  New (non-IDNA2003) code tries to resolve a domain that is not
> legal under IDNA2003.
>
> 2.  New code thinks a domain, that _is_ legal under IDNA2003, is not
> legal.
>
> 3.  New code resolves a domain differently than truly
> IDNA2003-compliant code.

If you look for problems that are serious problems, you should not  
look at resolution. You should start looking at registration.

The bad cases are for example:

A is registering a domain name, B is registering a different domain  
name. Where different is defined by IDNA(N). When IDNA(N+1) is  
released, A and B are not different anymore.

A is registering a domain name, that can be registered according to  
IDNA(N+1). A, or the customers of A, is then trying to look up the  
domain name, but that is not possible in IDNA(N).

I.e. the problem is when you have a registrant that have to be told to  
either unregister his domain name, or is told his domain name can no  
longer be used, or not yet be used (not be used before a new version  
of IDNA is released).

This is why we have to push the *rules* as the standard, and not a  
list of codepoints. So that the update of IDNA that is out in the wild  
will be updated in the same speed as Unicode.

Unicode might announce for version (N+1) that "Hey, we have added  
support for this language". What do you think that language community  
will say if ICANN have to wait for IETF action before they can assign  
an IDN-ccTLD? (Addition of a new script/language/codepoint will  
certainly not require changes in backward compatibilities or  
exceptions.) This is what we have to optimize the standard for.

> (1) is ugly and annoying, but not actually that harmful.  In the list
> of "crap thrown at the global DNS", it's surely down in the noise
> category (compared to, say, queries for .lan).  (2) is very bad,
> because it says that a domain that could be registered fails to
> resolve with the new libraries.  Similarly, (3) says that you get a
> different result depending on the libraries involved.  If we have
> examples of (2) or (3), however, I'm not aware of them.  I don't think
> I've seen a worked example of the issues coming from the normalization
> trouble outlined in RFC 4690 section 3.1, so I can't evaluate how
> serious the problem is (I'm not therefore dismissing it -- I just
> don't know what weight to put on it).
>
> Answering "how bad" here is, I think, very important.  I'm
> increasingly distressed at the degree of complication IDNA2008 appears
> to be proposing.  For instance, I think there is every reason to
> suppose that the "local mapping" approach IDNA2008 is taking (and
> which still, I emphasise, appeals to me) opens a wider vector for
> phishing problems even than we have with IDNA2003.  We have decided
> phishing isn't a problem we can solve, but people are going to be
> angry if instead we make it worse.

As I have said many times before, I think we need mappings, but the  
details of the mappings must be treated with application by  
application due to the corner cases that exists. There might be a core  
substance that is shared, and that will solve the majority of the  
issues. And just because the core exists, I do not think we will see  
the problems you talk about.

I see much more problems today because people do not understand today  
what is allowed. Is that anything that maps to a good codepoint  
according to IDNA2003, or is it just the codepoints that are ok as  
output of IDNA2003? I have long time ago lost count of the number of  
questions I get from registries and registrars on this.

> Moreover, we have noted that registries will need guidance, and that
> some of the people making decisions about implementation may not
> really be in a position to completely understand the full document
> set.

And they do not have to UNDERSTAND it. Because a large part of the  
document set is the answer to the "WHY" question. Not the "HOW"  
question.

> Surely if that's the case (and I don't deny it), a system as
> complicated as IDNA2008 is going to encounter some trouble.
> Especially since we don't seem to want to add the implementers'
> document to our document set.
>
> Finally, we are making what is plainly an incompatible on-the-wire
> change by deciding that ß is in, without changing the encoding prefix.
> The more I ponder that, the more I think we must have taken leave of
> our senses.  (I include myself -- more than anyone else! -- in that
> "we".)

Well, I did argue against that change, but I found myself being in a  
minority in the wg. The community, and specifically the registry that  
will be affected by this incompatible change, have said "we have  
things under control", and as I have said before, I am *not* the one  
arguing against a community that ask for things. They know things in  
their context better than I do.

> These all seem to me to be very big costs.  If we look at this as a
> trade-off, is the gain worth costs that big?  I am embarrassed to say
> that I really don't know.  Paul's draft, however, has made me stop and
> think about these questions, and I'm not at all happy with how I feel
> when I think about them.
>
>> The plan that I am pushing for, and to be honest, I have as document
>> editor not heard from the wg chair what the actual consensus is, is
>> that we need protocol action for _any_change_of_the_document_, which
>> implies only if changes to exceptions, backward compatibility and
>> regular expressions. Not if Unicode come with a change that add  
>> things
>> that does not require changes to any of those.
>
> Ok, excellent, thanks.  That's indeed a significant advantage to the
> IDNA2008 approach.


That is what I see. And note that *I* might be in the minority that  
even want IETF action on the first change. I can change that. We can  
change the eszet, we can remove the regular expressions, we can remove  
many things from IDNA2008 to make it as "appealing" as IDNAv2, but  
then we are back to where IDNA2008 was about 1,5 years ago I claim  
(part from the document restructuring).

But as I said in my first message, better to have the discussion now  
than later.

    Patrik



More information about the Idna-update mailing list