What rules have been used for the current list of codepoints?

Thu Dec 14 22:56:00 CET 2006

At 19:58 +0100 2006-12-14, Patrik Fältström wrote:
>On 14 dec 2006, at 11.10, Michael Everson wrote:
>
>>, but *I* am absolutely sure that you cannot 
>>exclude characters from this block by excluding 
>>the block. This will deny IDN to millions of 
>>people.
>>
>>Is that clear enough?
>
>This is exactly my point.
>
>Just because some of the codepoints are really 
>really really important, we have to include the 
>whole set of codepoints of that 
>{script,block,class,whatever}.

I never said that. I say there are characters we 
know we need. You lot are trying to do this 
algorithimically, by assuming that the content of 
certain *blocks* in the UCS is anything other 
than accidental. Certainly for Latin that is not 
the case.

So the mistake is to try to do what you are 
doing, to base things on UCS blocks.

>Some other people on this list say that when 
>selecting that whole set, we will get also some 
>codepoints "for free" that we do not want.

I said you MAY NOT exclude the IPA block. I did 
not say that you MUST include everything in it.

Indeed some of the other blocks for which you 
include everything might well benefit from 
weeding.

>That is for me evidence that the selectors that 
>we discuss (and should continue to discuss 
>obviously) are not good enough.

The UCS Block is not a good selector. I believe 
this has been said before, and for the same 
reasons: schwa, ezh, and a dozen African letters.

We should begin with script property, because we 
need to restrict the mixing of certain (most) 
scripts within labels. Are you able to use that 
property to distinguish between characters?
-- 
Michael Everson * http://www.evertype.com