Mapping tables (was Re: IDNA online tool)

Rémy Renardin renardinr at gmail.com
Sat Apr 11 01:05:15 CEST 2009


Louis,
JFC's motto is "nothing is impossible to who does not care one knows who
made it!". But if you read Mark carefully you will find some oddities.

2009/4/10 LB <lbleriot at gmail.com>

> We need to recognize that the people in this working group are a very small
>> subset of those affected by the changes we make, and are only partially
>> representative. Someone whose primary editor is emacs, for example, is
>> hardly a typical user! Of course the WG is open to any and all comers,
>>
>
Except our france at large and the A-FRA Chair. Interesting enough.


>  but certainly not all types of people that will be directly or indirectly
>> affected by these changes will realize that this group even exists, let
>> alone that it will be making incompatible changes. (When I mention to
>> ordinary engineers that the changes in IDNA2008 will result in the same URL
>> going to different IP addresses on different browsers, they look at me like
>> I'm completely, utterly, crazy.)
>>
>> Most of the people on this list will not be the ones that get the error
>> reports, where people are really upset because something used to work and
>> doesn't anymore. We will only find out after the fact just how bad it is -
>> sadly, we don't have the luxury of a beta phase, where our users can see
>> what the effects of these changes are, and let us know where we must make
>> changes.
>>
>
The beta phase resulted in banning the user :-)
This starts making noise around .... so one becomes more careful.
Whatever the story they build, users will have to bite it.


>  But before we blithely rush to doing just case+width mapping, we have to
>> do due diligence: carefully looking over the other instances, and making
>> sure that exclusion is on balance the right approach. It is probably ok to
>> remove Arabic presentation forms, but we'd better check first with the
>> Arabic-script communities to make sure; just as we heard back from the CJK
>> communities that Width mapping was important.
>>
>
You will note he does not want to quote French scripting problem: it creates
a fundamental problem due to the ASCII upper-cases.
JFC says that we do not care as .FRA will not use "xn--", I would suggest we
try a little bit longer before saying that.

Mark speaks French very well. I am sure he can read:
http://fr.wikipedia.org/wiki/Majuscule.

In case some people on this list do not read French, they can easily feel
the complexity: in comparing the page size with its English counter-part:
http://en.wikipedia.org/wiki/Capital_letter.

Rémy Renardin


2009/4/10 LB <lbleriot at gmail.com>

> JFC called me to ask me to send "+1"---
> LB
>
> 2009/4/10 Mark Davis <mark at macchiato.com>
>
>> I usually try to err on the side of concision, since otherwise I know
>> people's eyes quickly glaze over, but I'll take a bit longer to address the
>> issue you raised (though I will never be any competition for John ;-). So
>> please bear with me this time.
>>
>> The reason that I included the ability to change the mapping filter is
>> just so that people could try out different choices, and see what an actual
>> difference it would make. I suspect that we will end up with not using the
>> IDNA2003 mapping style (NFKC+CF+removing default ignorables), but we need to
>> examine our choices very carefully. For example, I think John's suggestion
>> of not having the "removing default ignorables" be part of the mapping is a
>> reasonable one.
>>
>> I may seem pretty conservative on this account. This is probably due to
>> the experiences of over 20 years with Unicode, which is such a core
>> technology that changes ripple quite broadly. There have been times where we
>> made a change that looked absolutely reasonable, would be clearly the right
>> thing to do, and would not affect anyone negatively --* where all reports
>> from contacts in community X **(eg BIDI) **said that they didn't use the
>> characters, and so a change wouldn't matter.* Yet once systems start
>> being deployed, low and behold the error reports come rolling in -- it turns
>> out that they *are* used by X community, and users are extremely unhappy.
>> Those of us who have worked in operating systems are also painfully aware of
>> this kind of problem.
>>
>> So even though Unicode could be improved in many ways with incompatible
>> changes, we have really gotten quite conservative about them, just because
>> of unintended and unforeseen consequences.
>>
>> Someone on this listed noted that "The registrants are the main clients of
>> IDN domains, hence they are the main clients of this WG." That is not true:
>> there are many, many stakeholders involved. Registrants certainly, but also
>> registries, and *most importantly,* users of programs that accept and
>> display URLs: browsing, email, chat, IM, and so on. Those users include a
>> very a pretty signficant portion of the world's population. And we will face
>> a long transition period where both IDNA2003 and its successor are in play -
>> we at Google see just how many people are using very old browsers, and of
>> course the DNS people know just how long it has taken to deploy IPv6.
>>
>> We need to recognize that the people in this working group are a very
>> small subset of those affected by the changes we make, and are only
>> partially representative. Someone whose primary editor is emacs, for
>> example, is hardly a typical user! Of course the WG is open to any and all
>> comers, but certainly not all types of people that will be directly or
>> indirectly affected by these changes will realize that this group even
>> exists, let alone that it will be making incompatible changes. (When I
>> mention to ordinary engineers that the changes in IDNA2008 will result in
>> the same URL going to different IP addresses on different browsers, they
>> look at me like I'm completely, utterly, crazy.)
>>
>> Most of the people on this list will not be the ones that get the error
>> reports, where people are really upset because something used to work and
>> doesn't anymore. We will only find out after the fact just how bad it is -
>> sadly, we don't have the luxury of a beta phase, where our users can see
>> what the effects of these changes are, and let us know where we must make
>> changes.
>>
>> That's why we in this group need to very careful about incompatible
>> changes to an existing, deployed standard (IDNA2003) except where:
>>
>> (a) there is clear harm, or
>> (b) the characters in question occur in demonstrably very low frequency.
>>
>> The latter is why the Unicode Consortium is ok, for example, with the
>> removal of the vast majority of symbols and punctuation, because the vast
>> majority are used with such low frequency, even though only a couple of them
>> have actually be shown to be at all harmful. And I think it is probably ok
>> to remove the circled items from mapping (
>> http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:dt=Enc:]<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B:dt=Enc:%5D>),
>> even though they cause no harm either (removal has little potential upside,
>> only potential downside).
>>
>> But before we blithely rush to doing just case+width mapping, we have to
>> do due diligence: carefully looking over the other instances, and making
>> sure that exclusion is on balance the right approach. It is probably ok to
>> remove Arabic presentation forms, but we'd better check first with the
>> Arabic-script communities to make sure; just as we heard back from the CJK
>> communities that Width mapping was important.
>>
>> And people cannot simply depend on the Unicode decomposition type (dt)
>> (See http://unicode.org/cldr/utility/properties.jsp#Decomposition_Type)
>> to make all the decisions for them; it would not be a good idea to exclude DZ
>> from mapping, for example, and its dt is Compat. The dt property will be
>> useful, in the the same way as other property-based rules in the tables doc
>> are, but the categories defined by that property may not match exactly to
>> what end-user's needs are, and thus need to be reviewed.
>>
>> Mark
>>
>> PS. And bringing this back to subject of the online tool, if there are
>> changes to the tools at http://unicode.org/cldr/utility/index.jsp that
>> would help in such a review, please let me know.
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>
>
>
> --
> LB
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090411/b5eb1756/attachment-0001.htm 


More information about the Idna-update mailing list