What rules have been used for the current list of codepoints?

Michael Everson everson at evertype.com
Thu Dec 14 23:31:37 CET 2006


At 23:09 +0100 2006-12-14, Patrik Fältström wrote:

>>So the mistake is to try to do what you are 
>>doing, to base things on UCS blocks.
>
>No, I am doing it based on any parameter I get 
>from the Unicode tables. In the latest tables 
>mostly class, secondly script.

I understood you to have proposed omitting the IPA Extensions block.

>>  said you MAY NOT exclude the IPA block. I did 
>>not say that you MUST include everything in it.
>
>How do I include only a part of it? What is the selector I am supposed to use?

You would have to have a lookup table of particular characters to be omitted.

The writing systems of the world are untidy. They 
arose by the activity of human beings in many 
places, with many tools, and even the UCS's Latin 
repertoire was put together long before people 
thought about saving space and having blocks make 
some more sense than they do. And it continues to 
grow, with more and more characters being added 
for various purposes.

You can't finesse this algorithmically.

>>We should begin with script property, because 
>>we need to restrict the mixing of certain 
>>(most) scripts within labels. Are you able to 
>>use that property to distinguish between 
>>characters?
>
>See list of rules I have passed around, that 
>Mark has updated. The answer is yes, I think. I 
>am a bit uncertain that I understand your 
>question.

Are you able to use the script property to 
distinguish between a character belonging to the 
Latin script and one belonging to the Cyrillic 
script?
-- 
Michael Everson * http://www.evertype.com


More information about the Idna-update mailing list