What rules have been used for the current list of codepoints?

Kenneth Whistler kenw at sybase.com
Thu Dec 14 23:00:10 CET 2006


> On 14 dec 2006, at 11.10, Michael Everson wrote:
> 
> > , but *I* am absolutely sure that you cannot exclude characters  
> > from this block by excluding the block. This will deny IDN to  
> > millions of people.
> >
> > Is that clear enough?
> 
> This is exactly my point.
> 
> Just because some of the codepoints are really really really  
> important, we have to include the whole set of codepoints of that  
> {script,block,class,whatever}. Some other people on this list say  
> that when selecting that whole set, we will get also some codepoints  
> "for free" that we do not want.
> 
> That is for me evidence that the selectors that we discuss (and  
> should continue to discuss obviously) are not good enough.

I don't draw that conclusion at all.

It is evidence that *some* participants want to argue
characters one-by-one and exclude some that they do not
want.

Other participants (including me and Michael, I surmise) think
that is a black hole for indefinite argumentation, and is
also a prescription for a maintenance nightmare. You
are *never* going to get consensus if you start arguing
IPA characters one at a time, because somebody is always
going to come back pointing to some modern language orthography
using another one. And, it just really, really, isn't worth
the cycles, because having IPA characters available in
IDNs *ISN'T* the problem in the first place.

I have been contending all along that scripts are the right
level of granularity to be having this discussion. Either a
script is appropriate or it isn't. End of story on that front.
You can toss the arguable edge cases into the "not just yet"
category and let somebody come forward to make the case for
including N'ko, say, as a script -- but there just isn't
any payoff in arguing, say that U+106E MYANMAR LETTER EASTERN PWO
KAREN isn't used in Burmese or even in a major dialect that
might be showing up soon requesting registration of IDNs,
so we should omit that one character, but include some
other Myanmar letters, for example. It really isn't worth
the headache for maintenance, nor is it worth the kind of
political buzzsaw you are going to run into if you start
trying to restrict on a letter-by-letter basis and make people
apply to get "their" letter onto the inclusion list.

The approach that Mark and I have been advocating cuts 98% of
the crap out of the table in big, defensible, rule-based chunks.

What we should be doing now is talking about whether all of
the scripts left in the inclusion list should stay there,
or whether one or two more could be omitted at this time.

And then we should take a hard look at the remaining
sets of non-spacing combining marks to fine good reasons
to omit more of them that *nobody* will need in IDNs because
they aren't ever used in the ordinary representation of
normal words in any language.

--Ken







More information about the Idna-update mailing list