What rules have been used for the current list of codepoints?

Mark Davis markdavis at google.com
Wed Dec 13 17:45:32 CET 2006


My comments on the list proposed:

>Do I see a consensus on this list that I should remove rule 2?
Yes, #2 needs to be removed -- many of these are required for modern
languages.*

>Do I see a consensus on this list that I should also include Lm and Nd?
(Then rule 4 can be removed.)
Yes, #3 needs to be expanded by adding Lm -- again, many of these are
required for modern languages.*, **

In addition,
#1 needs to be removed -- there are many modern languages that use IPA
characters.*
#6 should be 'casefolded' (this almost completely the same as lowercase, but
there are a few important exceptions)

* It would be possible to sift through to see which are only technical, and
which are used in modern languages, but as a class they can't be excluded.
** There are pluses and minuses to adding Nd as well;

I'd then recommend a slightly different formulation, because it is unclear
when you have rule X saying 'ok' and rule Y saying 'not ok' which one wins.
So I'd recast as a series of additions and removals; thus the later one
'wins'. Then the rules would be written as:

0. Start with the empty set.
1. If generalCategory(cp) is [Ll, Lo, Lm, Mn, Mc], add cp
2. If NFKC(cp) != cp, remove cp
3. If casefold(cp) != cp, remove cp
4. If cp is in [-A-Z0-9], add cp

Mark

On 12/13/06, Patrik Fältström <patrik at frobbit.se> wrote:
>
> I understand there is confusing what rules have been used TODAY for
> the list of codepoints.
>
> These are the rules, the first that matches tell whether the
> codepoint is ok to include or not.
>
> 1. If block is "IPA Extensions", the codepoint is not ok

2. If the script is "Inherited", the codepoint is not ok

3. If the codepoint is [A-Z], the codepoint is ok
> 4. If the codepoint is [0-9], the codepoint is ok
> 5. If NFKC(cp) != cp, the codepoint is not ok
> 6. If lowercase(cp) != cp, the codepoint is not ok
> 7. If class is [Ll, Lo, Mn, Mc], the codepoint is ok
>
> I have a suggestion that rule 7 should also include classes Lm and
> Nd, but I have not included that.
>
> Do I see a consensus on this list that I should also include Lm and
> Nd? (Then rule 4 can be removed.)
>
> I also have a suggestion that rule 2 above should be removed, that I
> went one step too far in conclusions from earlier discussions.
>
> Do I see a consensus on this list that I should remove rule 2?
>
> BTW, the URL to the latest document is http://stupid.domain.name/
> idnabis/table-latest.html.
>
> Other changes you will see is:
>
> (a) The list of rules (that you see above) will be included in the
> document
> (b) The scripts will be in english alphabetical order
>
>      Patrik
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20061213/6a7ae767/attachment.html


More information about the Idna-update mailing list