What rules have been used for the current list of codepoints?

Patrik Fältström patrik at frobbit.se
Thu Dec 14 23:09:17 CET 2006


On 14 dec 2006, at 22.56, Michael Everson wrote:

> I never said that. I say there are characters we know we need. You  
> lot are trying to do this algorithimically, by assuming that the  
> content of certain *blocks* in the UCS is anything other than  
> accidental. Certainly for Latin that is not the case.
>
> So the mistake is to try to do what you are doing, to base things  
> on UCS blocks.

No, I am doing it based on any parameter I get from the Unicode  
tables. In the latest tables mostly class, secondly script.

See the list by Mark, he use block, script and class in a combination.

>> Some other people on this list say that when selecting that whole  
>> set, we will get also some codepoints "for free" that we do not want.
>
> I said you MAY NOT exclude the IPA block. I did not say that you  
> MUST include everything in it.

How do I include only a part of it? What is the selector I am  
supposed to use?

> Indeed some of the other blocks for which you include everything  
> might well benefit from weeding.
>
>> That is for me evidence that the selectors that we discuss (and  
>> should continue to discuss obviously) are not good enough.
>
> The UCS Block is not a good selector. I believe this has been said  
> before, and for the same reasons: schwa, ezh, and a dozen African  
> letters.
>
> We should begin with script property, because we need to restrict  
> the mixing of certain (most) scripts within labels. Are you able to  
> use that property to distinguish between characters?

See list of rules I have passed around, that Mark has updated. The  
answer is yes, I think. I am a bit uncertain that I understand your  
question.

    Patrik



More information about the Idna-update mailing list