What rules have been used for the current list of codepoints?

Patrik Fältström patrik at frobbit.se
Thu Dec 14 23:38:52 CET 2006


On 14 dec 2006, at 23.31, Michael Everson wrote:

> At 23:09 +0100 2006-12-14, Patrik Fältström wrote:
>
>>> So the mistake is to try to do what you are doing, to base things  
>>> on UCS blocks.
>>
>> No, I am doing it based on any parameter I get from the Unicode  
>> tables. In the latest tables mostly class, secondly script.
>
> I understood you to have proposed omitting the IPA Extensions block.

Correct, that is true. But that was only one of the selection  
criteria I used. And I know I have pushback for that one. Heavy  
pushback, so IPA will be there again in next version.

>>>  said you MAY NOT exclude the IPA block. I did not say that you  
>>> MUST include everything in it.
>>
>> How do I include only a part of it? What is the selector I am  
>> supposed to use?
>
> You would have to have a lookup table of particular characters to  
> be omitted.
>
> The writing systems of the world are untidy. They arose by the  
> activity of human beings in many places, with many tools, and even  
> the UCS's Latin repertoire was put together long before people  
> thought about saving space and having blocks make some more sense  
> than they do. And it continues to grow, with more and more  
> characters being added for various purposes.
>
> You can't finesse this algorithmically.

I thought this was why we had classes. What stops us from having a  
new class that is "suitable for domain names"?

I.e. my point is that the list of rules already now (before we start  
doing individual inspections of code points) is quite complex. We  
have already discussed how to explain the rules so that they are not  
confusing in what to expect if more than one rule matches etc.

>>> We should begin with script property, because we need to restrict  
>>> the mixing of certain (most) scripts within labels. Are you able  
>>> to use that property to distinguish between characters?
>>
>> See list of rules I have passed around, that Mark has updated. The  
>> answer is yes, I think. I am a bit uncertain that I understand  
>> your question.
>
> Are you able to use the script property to distinguish between a  
> character belonging to the Latin script and one belonging to the  
> Cyrillic script?

Yes.

    Patrik



More information about the Idna-update mailing list