Mapping tables (was Re: IDNA online tool)

LB lbleriot at gmail.com
Fri Apr 10 23:16:21 CEST 2009


JFC called me to ask me to send "+1"---
LB

2009/4/10 Mark Davis <mark at macchiato.com>

> I usually try to err on the side of concision, since otherwise I know
> people's eyes quickly glaze over, but I'll take a bit longer to address the
> issue you raised (though I will never be any competition for John ;-). So
> please bear with me this time.
>
> The reason that I included the ability to change the mapping filter is just
> so that people could try out different choices, and see what an actual
> difference it would make. I suspect that we will end up with not using the
> IDNA2003 mapping style (NFKC+CF+removing default ignorables), but we need to
> examine our choices very carefully. For example, I think John's suggestion
> of not having the "removing default ignorables" be part of the mapping is a
> reasonable one.
>
> I may seem pretty conservative on this account. This is probably due to the
> experiences of over 20 years with Unicode, which is such a core technology
> that changes ripple quite broadly. There have been times where we made a
> change that looked absolutely reasonable, would be clearly the right thing
> to do, and would not affect anyone negatively --* where all reports from
> contacts in community X **(eg BIDI) **said that they didn't use the
> characters, and so a change wouldn't matter.* Yet once systems start being
> deployed, low and behold the error reports come rolling in -- it turns out
> that they *are* used by X community, and users are extremely unhappy.
> Those of us who have worked in operating systems are also painfully aware of
> this kind of problem.
>
> So even though Unicode could be improved in many ways with incompatible
> changes, we have really gotten quite conservative about them, just because
> of unintended and unforeseen consequences.
>
> Someone on this listed noted that "The registrants are the main clients of
> IDN domains, hence they are the main clients of this WG." That is not true:
> there are many, many stakeholders involved. Registrants certainly, but also
> registries, and *most importantly,* users of programs that accept and
> display URLs: browsing, email, chat, IM, and so on. Those users include a
> very a pretty signficant portion of the world's population. And we will face
> a long transition period where both IDNA2003 and its successor are in play -
> we at Google see just how many people are using very old browsers, and of
> course the DNS people know just how long it has taken to deploy IPv6.
>
> We need to recognize that the people in this working group are a very small
> subset of those affected by the changes we make, and are only partially
> representative. Someone whose primary editor is emacs, for example, is
> hardly a typical user! Of course the WG is open to any and all comers, but
> certainly not all types of people that will be directly or indirectly
> affected by these changes will realize that this group even exists, let
> alone that it will be making incompatible changes. (When I mention to
> ordinary engineers that the changes in IDNA2008 will result in the same URL
> going to different IP addresses on different browsers, they look at me like
> I'm completely, utterly, crazy.)
>
> Most of the people on this list will not be the ones that get the error
> reports, where people are really upset because something used to work and
> doesn't anymore. We will only find out after the fact just how bad it is -
> sadly, we don't have the luxury of a beta phase, where our users can see
> what the effects of these changes are, and let us know where we must make
> changes.
>
> That's why we in this group need to very careful about incompatible changes
> to an existing, deployed standard (IDNA2003) except where:
>
> (a) there is clear harm, or
> (b) the characters in question occur in demonstrably very low frequency.
>
> The latter is why the Unicode Consortium is ok, for example, with the
> removal of the vast majority of symbols and punctuation, because the vast
> majority are used with such low frequency, even though only a couple of them
> have actually be shown to be at all harmful. And I think it is probably ok
> to remove the circled items from mapping (
> http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:dt=Enc:]), even
> though they cause no harm either (removal has little potential upside, only
> potential downside).
>
> But before we blithely rush to doing just case+width mapping, we have to do
> due diligence: carefully looking over the other instances, and making sure
> that exclusion is on balance the right approach. It is probably ok to remove
> Arabic presentation forms, but we'd better check first with the
> Arabic-script communities to make sure; just as we heard back from the CJK
> communities that Width mapping was important.
>
> And people cannot simply depend on the Unicode decomposition type (dt) (See
> http://unicode.org/cldr/utility/properties.jsp#Decomposition_Type) to make
> all the decisions for them; it would not be a good idea to exclude DZ from
> mapping, for example, and its dt is Compat. The dt property will be useful,
> in the the same way as other property-based rules in the tables doc are, but
> the categories defined by that property may not match exactly to what
> end-user's needs are, and thus need to be reviewed.
>
> Mark
>
> PS. And bringing this back to subject of the online tool, if there are
> changes to the tools at http://unicode.org/cldr/utility/index.jsp that
> would help in such a review, please let me know.
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>


-- 
LB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090410/3825cf16/attachment.htm 


More information about the Idna-update mailing list