What rules have been used for the current list of codepoints?
Michael Everson
everson at evertype.com
Thu Dec 14 23:31:37 CET 2006
At 23:09 +0100 2006-12-14, Patrik Fältström wrote:
>>So the mistake is to try to do what you are
>>doing, to base things on UCS blocks.
>
>No, I am doing it based on any parameter I get
>from the Unicode tables. In the latest tables
>mostly class, secondly script.
I understood you to have proposed omitting the IPA Extensions block.
>> said you MAY NOT exclude the IPA block. I did
>>not say that you MUST include everything in it.
>
>How do I include only a part of it? What is the selector I am supposed to use?
You would have to have a lookup table of particular characters to be omitted.
The writing systems of the world are untidy. They
arose by the activity of human beings in many
places, with many tools, and even the UCS's Latin
repertoire was put together long before people
thought about saving space and having blocks make
some more sense than they do. And it continues to
grow, with more and more characters being added
for various purposes.
You can't finesse this algorithmically.
>>We should begin with script property, because
>>we need to restrict the mixing of certain
>>(most) scripts within labels. Are you able to
>>use that property to distinguish between
>>characters?
>
>See list of rules I have passed around, that
>Mark has updated. The answer is yes, I think. I
>am a bit uncertain that I understand your
>question.
Are you able to use the script property to
distinguish between a character belonging to the
Latin script and one belonging to the Cyrillic
script?
--
Michael Everson * http://www.evertype.com
More information about the Idna-update
mailing list