What rules have been used for the current list of codepoints?
Patrik Fältström
patrik at frobbit.se
Thu Dec 14 23:09:17 CET 2006
On 14 dec 2006, at 22.56, Michael Everson wrote:
> I never said that. I say there are characters we know we need. You
> lot are trying to do this algorithimically, by assuming that the
> content of certain *blocks* in the UCS is anything other than
> accidental. Certainly for Latin that is not the case.
>
> So the mistake is to try to do what you are doing, to base things
> on UCS blocks.
No, I am doing it based on any parameter I get from the Unicode
tables. In the latest tables mostly class, secondly script.
See the list by Mark, he use block, script and class in a combination.
>> Some other people on this list say that when selecting that whole
>> set, we will get also some codepoints "for free" that we do not want.
>
> I said you MAY NOT exclude the IPA block. I did not say that you
> MUST include everything in it.
How do I include only a part of it? What is the selector I am
supposed to use?
> Indeed some of the other blocks for which you include everything
> might well benefit from weeding.
>
>> That is for me evidence that the selectors that we discuss (and
>> should continue to discuss obviously) are not good enough.
>
> The UCS Block is not a good selector. I believe this has been said
> before, and for the same reasons: schwa, ezh, and a dozen African
> letters.
>
> We should begin with script property, because we need to restrict
> the mixing of certain (most) scripts within labels. Are you able to
> use that property to distinguish between characters?
See list of rules I have passed around, that Mark has updated. The
answer is yes, I think. I am a bit uncertain that I understand your
question.
Patrik
More information about the Idna-update
mailing list